sklearn之分類模型交叉驗證

本文轉載自查看原文 2019-07-15 20:48 1230 機器學習/ sklearn之分類模型交叉驗證

'''
    分類之交叉驗證：
            由於數據集的划分有不確定性，若隨機划分的樣本正好處於某類特殊樣本，則得到的訓練模型所預測的結果的可信度將受到質疑。
            所以需要進行多次交叉驗證，把樣本空間中的所有樣本均分成n份，使用不同的訓練集訓練模型，對不同的測試集進行測試時輸出指標得分。
    sklearn提供了交叉驗證相關API：
            import sklearn.model_selection as ms
            ms.cross_val_score(模型, 輸入集, 輸出集, cv=折疊數, scoring=指標名)->指標值數組

    交叉驗證指標:
            1.精確度(accuracy)：分類正確的樣本數/總樣本數
            2.查准率(precision_weighted)：針對每一個類別，預測正確的樣本數比上預測出來的樣本數
            3.召回率(recall_weighted)：針對每一個類別，預測正確的樣本數比上實際存在的樣本數
            4.f1得分(f1_weighted)：2x查准率x召回率/(查准率+召回率)
            在交叉驗證過程中，針對每一次交叉驗證，計算所有類別的查准率、召回率或者f1得分，然后取各類別相應指標值的平均數，
            作為這一次交叉驗證的評估指標，然后再將所有交叉驗證的評估指標以數組的形式返回調用者。



'''
import numpy as np
import matplotlib.pyplot as mp
import sklearn.naive_bayes as nb
import sklearn.model_selection as ms

data = np.loadtxt('./ml_data/multiple1.txt', delimiter=',', unpack=False, dtype='f8')
print(data.shape)
x = np.array(data[:, :-1])
y = np.array(data[:, -1])

# 訓練集和測試集的划分    使用訓練集訓練 再使用測試集測試，並繪制測試集樣本圖像
train_x, test_x, train_y, test_y = ms.train_test_split(x, y, test_size=0.25, random_state=7)

# 針對訓練集，做5次交叉驗證，若得分還不錯再訓練模型
model = nb.GaussianNB()
# 精確度
score = ms.cross_val_score(model, train_x, train_y, cv=5, scoring='accuracy')
print('accuracy score=', score)
print('accuracy mean=', score.mean())

# 查准率
score = ms.cross_val_score(model, train_x, train_y, cv=5, scoring='precision_weighted')
print('precision_weighted score=', score)
print('precision_weighted mean=', score.mean())

# 召回率
score = ms.cross_val_score(model, train_x, train_y, cv=5, scoring='recall_weighted')
print('recall_weighted score=', score)
print('recall_weighted mean=', score.mean())

# f1得分
score = ms.cross_val_score(model, train_x, train_y, cv=5, scoring='f1_weighted')
print('f1_weighted score=', score)
print('f1_weighted mean=', score.mean())

# 訓練NB模型，完成分類業務
model.fit(train_x, train_y)
pred_test_y = model.predict(test_x)
# 得到預測輸出，可以與真實輸出作比較，計算預測的精准度(預測正確的樣本數/總測試樣本數)
ac = (test_y == pred_test_y).sum() / test_y.size
print('預測精准度 ac=', ac)

# 繪制分類邊界線
l, r = x[:, 0].min() - 1, x[:, 0].max() + 1
b, t = x[:, 1].min() - 1, x[:, 1].max() + 1
n = 500
grid_x, grid_y = np.meshgrid(np.linspace(l, r, n), np.linspace(b, t, n))
bg_x = np.column_stack((grid_x.ravel(), grid_y.ravel()))
bg_y = model.predict(bg_x)
grid_z = bg_y.reshape(grid_x.shape)

# 畫圖
mp.figure('NB Classification', facecolor='lightgray')
mp.title('NB Classification', fontsize=16)
mp.xlabel('X', fontsize=14)
mp.ylabel('Y', fontsize=14)
mp.tick_params(labelsize=10)
mp.pcolormesh(grid_x, grid_y, grid_z, cmap='gray')
mp.scatter(test_x[:, 0], test_x[:, 1], s=80, c=test_y, cmap='jet', label='Samples')

mp.legend()
mp.show()

輸出結果：
(400, 3)
accuracy score= [1.         1.         1.         1.         0.98305085]
accuracy mean= 0.9966101694915255
precision_weighted score= [1.         1.         1.         1.         0.98411017]
precision_weighted mean= 0.996822033898305
recall_weighted score= [1.         1.         1.         1.         0.98305085]
recall_weighted mean= 0.9966101694915255
f1_weighted score= [1.         1.         1.         1.         0.98303199]
f1_weighted mean= 0.9966063988235516
預測精准度 ac= 0.99

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 訓練模型：交叉驗證機器學習- Sklearn (交叉驗證和Pipeline) sklearn,交叉驗證中的分層抽樣 2-機器學習-KNN近鄰算法分類模型、交叉驗證交叉驗證來評價模型的性能 Sklearn中二分類問題的交叉熵計算 sklearn特征選擇和分類模型分類預測，交叉驗證調超參數 sklearn的K折交叉驗證函數KFold使用 sklearn.metrics.classification_report分類模型評估