驗證曲線是用來提高模型的性能,驗證曲線和學習曲線很相近,不同的是這里畫出的是不同參數下模型的准確率而不是不同訓練集大小下的准確率,主要用來調參,validation_curve方法使用采樣k折交叉驗證來評估模型的性能。
sklearn.model_selection.validation_curve(estimator, X, y, *, param_name, param_range, groups=None, cv=None, scoring=None, n_jobs=None, pre_dispatch='all', verbose=0, error_score=nan)
參數:
param_name :str,要評估的參數值,如果當model為SVC時,改變gamma的值,求最好的那個gamma值
param_range:參數的范圍
返回:
train_scores:訓練集得分
test_scores:測試集得分
from sklearn import datasets from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import validation_curve import numpy as np import matplotlib.pyplot as plt (X,y) = datasets.load_digits(return_X_y=True) # print(X[:2,:]) param_range = [10,20,40,80,160,250] train_score,test_score = validation_curve(RandomForestClassifier(),X,y,param_name='n_estimators',param_range=param_range,cv=10,scoring='accuracy') train_score = np.mean(train_score,axis=1) test_score = np.mean(test_score,axis=1) plt.plot(param_range,train_score,'o-',color = 'r',label = 'training') plt.plot(param_range,test_score,'o-',color = 'g',label = 'testing') plt.legend(loc='best') plt.xlabel('number of tree') plt.ylabel('accuracy') plt.show()