还是水果分类原始数据,这次使用KNN算法实现水果分类器。K值选择1、3、5、7,看预测结果。
预测结果截选如下:
k=1时,预测整体准确率(accuracy)是:66.67%
预测值是:[0];真实值是:0
预测值是:[3];真实值是:3
预测值是:[2];真实值是:2
……
k=3时,预测整体准确率(accuracy)是:75.00%
预测值是:[0];真实值是:0
预测值是:[3];真实值是:3
预测值是:[2];真实值是:2
……
k=5时,预测整体准确率(accuracy)是:58.33%
……
k=7时,预测整体准确率(accuracy)是:66.67%
……
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier data_path = './data/fruit_data.csv' output_dir = './output/' label_dict = {'apple':0, 'mandarin':1, 'lemon':2, 'orange':3 } feat_cols = ['mass','width','height','color_score'] k_values = [1,3,5,7] if __name__ == '__main__': data_df = pd.read_csv(data_path) data_df['label'] = data_df['fruit_name'].map(label_dict) X = data_df[feat_cols] # X = data_df[feat_cols].values也行,下同 y = data_df['label'] X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=19) for k in k_values: knn_model = KNeighborsClassifier(n_neighbors=k) knn_model.fit(X_train,y_train) accuracy = knn_model.score(X_test,y_test) print('k={}时,预测整体准确率(accuracy)是:{:.2f}%'.format(k,accuracy * 100)) # 看测试集里每一个值的预测情况 for i in range(X_test.shape[0]): pred_value = knn_model.predict([X_test.iloc[i].values]) true_value = y_test.iloc[i] print('预测值是:{};真实值是:{}'.format(pred_value, true_value)) print('__' * 60)