還是水果分類原始數據,這次使用KNN算法實現水果分類器。K值選擇1、3、5、7,看預測結果。
預測結果截選如下:
k=1時,預測整體准確率(accuracy)是:66.67%
預測值是:[0];真實值是:0
預測值是:[3];真實值是:3
預測值是:[2];真實值是:2
……
k=3時,預測整體准確率(accuracy)是:75.00%
預測值是:[0];真實值是:0
預測值是:[3];真實值是:3
預測值是:[2];真實值是:2
……
k=5時,預測整體准確率(accuracy)是:58.33%
……
k=7時,預測整體准確率(accuracy)是:66.67%
……
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
data_path = './data/fruit_data.csv'
output_dir = './output/'
label_dict = {'apple':0,
'mandarin':1,
'lemon':2,
'orange':3
}
feat_cols = ['mass','width','height','color_score']
k_values = [1,3,5,7]
if __name__ == '__main__':
data_df = pd.read_csv(data_path)
data_df['label'] = data_df['fruit_name'].map(label_dict)
X = data_df[feat_cols] # X = data_df[feat_cols].values也行,下同
y = data_df['label']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=19)
for k in k_values:
knn_model = KNeighborsClassifier(n_neighbors=k)
knn_model.fit(X_train,y_train)
accuracy = knn_model.score(X_test,y_test)
print('k={}時,預測整體准確率(accuracy)是:{:.2f}%'.format(k,accuracy * 100))
# 看測試集里每一個值的預測情況
for i in range(X_test.shape[0]):
pred_value = knn_model.predict([X_test.iloc[i].values])
true_value = y_test.iloc[i]
print('預測值是:{};真實值是:{}'.format(pred_value, true_value))
print('__' * 60)
