機器學習之路: python k近鄰分類器 KNeighborsClassifier 鳶尾花分類預測


 

使用python語言 學習k近鄰分類器的api

歡迎來到我的git查看源代碼: https://github.com/linyi0604/MachineLearning

 

  1 from sklearn.datasets import load_iris
  2 from sklearn.cross_validation import train_test_split
  3 from sklearn.preprocessing import StandardScaler
  4 from sklearn.neighbors import KNeighborsClassifier
  5 from sklearn.metrics import classification_report
  6 
  7 '''
  8 k近鄰分類器
  9 通過數據的分布對預測數據做出決策
 10 屬於無參數估計的一種
 11 非常高的計算復雜度和內存消耗
 12 '''
 13 
 14 '''
 15 1 准備數據
 16 '''
 17 # 讀取鳶尾花數據集
 18 iris = load_iris()
 19 # 檢查數據規模
 20 # print(iris.data.shape)    # (150, 4)
 21 # 查看數據說明
 22 # print(iris.DESCR)
 23 '''
 24 Iris Plants Database
 25 ====================
 26 
 27 Notes
 28 -----
 29 Data Set Characteristics:
 30     :Number of Instances: 150 (50 in each of three classes)
 31     :Number of Attributes: 4 numeric, predictive attributes and the class
 32     :Attribute Information:
 33         - sepal length in cm
 34         - sepal width in cm
 35         - petal length in cm
 36         - petal width in cm
 37         - class:
 38                 - Iris-Setosa
 39                 - Iris-Versicolour
 40                 - Iris-Virginica
 41     :Summary Statistics:
 42 
 43     ============== ==== ==== ======= ===== ====================
 44                     Min  Max   Mean    SD   Class Correlation
 45     ============== ==== ==== ======= ===== ====================
 46     sepal length:   4.3  7.9   5.84   0.83    0.7826
 47     sepal width:    2.0  4.4   3.05   0.43   -0.4194
 48     petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
 49     petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)
 50     ============== ==== ==== ======= ===== ====================
 51 
 52     :Missing Attribute Values: None
 53     :Class Distribution: 33.3% for each of 3 classes.
 54     :Creator: R.A. Fisher
 55     :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
 56     :Date: July, 1988
 57 
 58 This is a copy of UCI ML iris datasets.
 59 http://archive.ics.uci.edu/ml/datasets/Iris
 60 
 61 The famous Iris database, first used by Sir R.A Fisher
 62 
 63 This is perhaps the best known database to be found in the
 64 pattern recognition literature.  Fisher's paper is a classic in the field and
 65 is referenced frequently to this day.  (See Duda & Hart, for example.)  The
 66 data set contains 3 classes of 50 instances each, where each class refers to a
 67 type of iris plant.  One class is linearly separable from the other 2; the
 68 latter are NOT linearly separable from each other.
 69 
 70 References
 71 ----------
 72    - Fisher,R.A. "The use of multiple measurements in taxonomic problems"
 73      Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
 74      Mathematical Statistics" (John Wiley, NY, 1950).
 75    - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
 76      (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
 77    - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
 78      Structure and Classification Rule for Recognition in Partially Exposed
 79      Environments".  IEEE Transactions on Pattern Analysis and Machine
 80      Intelligence, Vol. PAMI-2, No. 1, 67-71.
 81    - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
 82      on Information Theory, May 1972, 431-433.
 83    - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
 84      conceptual clustering system finds 3 classes in the data.
 85    - Many, many more ...
 86    
 87    共有150個數據樣本
 88    均勻分布在3個亞種上
 89    每個樣本采樣4個花瓣、花萼的形狀描述
 90 '''
 91 
 92 '''
 93 2 划分訓練集合和測試集合
 94 '''
 95 x_train, x_test, y_train, y_test = train_test_split(iris.data,
 96                                                     iris.target,
 97                                                     test_size=0.25,
 98                                                     random_state=33)
 99 
100 '''
101 3 k近鄰分類器 學習模型和預測
102 '''
103 # 訓練數據和測試數據進行標准化
104 ss = StandardScaler()
105 x_train = ss.fit_transform(x_train)
106 x_test = ss.transform(x_test)
107 
108 # 建立一個k近鄰模型對象
109 knc = KNeighborsClassifier()
110 # 輸入訓練數據進行學習建模
111 knc.fit(x_train, y_train)
112 # 對測試數據進行預測
113 y_predict = knc.predict(x_test)
114 
115 '''
116 4 模型評估
117 '''
118 print("准確率:", knc.score(x_test, y_test))
119 print("其他指標:\n", classification_report(y_test, y_predict, target_names=iris.target_names))
120 '''
121 准確率: 0.8947368421052632
122 其他指標:
123               precision    recall  f1-score   support
124 
125      setosa       1.00      1.00      1.00         8
126  versicolor       0.73      1.00      0.85        11
127   virginica       1.00      0.79      0.88        19
128 
129 avg / total       0.92      0.89      0.90        38
130 '''

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM