metrics是sklearn用來做模型評估的重要模塊,提供了各種評估度量,現在自己整理如下:
一.通用的用法:Common cases: predefined values
1.1 sklearn官網上給出的指標如下圖所示:

1.2除了上圖中的度量指標以外,你還可以自定義一些度量指標:通過sklearn.metrics.make_scorer()方法進行定義;
make_scorer有兩種典型的用法:
用法一:包裝一些在metrics中已經存在的的方法,但是這種方法需要一些參數,例如fbeta_score方法,官網上給出的用法如下:
from sklearn.metrics import fbeta_score, make_scorer ftwo_scorer = make_scorer(fbeta_score, beta=2) from sklearn.model_selection import GridSearchCV from sklearn.svm import LinearSVC grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
第二種用法是完全的定義用戶自定義的函數,可以接受一下幾種參數:
1.你想用的python函數;
2、不論你提供的python函數返回的是socre或者是loss,if score:the higher the better;if loss: the lower the better
3、只為分類度量:無論你提供的python方法是不是需要連續的決策因素,默認為否
4.其他額外的參數,例如f1_score中的beta或者labels
例如官網上給出的例子:
import numpy as np def my_custom_loss_func(ground_truth, predictions): diff = np.abs(ground_truth - predictions).max() return np.log(1 + diff) # loss_func will negate the return value of my_custom_loss_func, # which will be np.log(2), 0.693, given the values for ground_truth # and predictions defined below. loss = make_scorer(my_custom_loss_func, greater_is_better=False) score = make_scorer(my_custom_loss_func, greater_is_better=True) ground_truth = [[1], [1]] predictions = [0, 1] from sklearn.dummy import DummyClassifier clf = DummyClassifier(strategy='most_frequent', random_state=0) clf = clf.fit(ground_truth, predictions) loss(clf,ground_truth, predictions) -0.69... score(clf,ground_truth, predictions) 0.69...
1.3 使用多重度量標准
sklearn也接受在GridSearchCV,RandomizedSearchCV和Cross_validate中接受多指標,有兩種指定方式;
##方式1: 使用字符串的list scoring=['accuracy','precision'] ### f方式2:使用dict進行mapping from sklearn.metrics import accuracy_score from sklearn.metrics import make_scorer scoring = {'accuracy': make_scorer(accuracy_score), 'prec': 'precision'}
注意:目前方式2(dict)模式只允許返回單一score的score方法,返回多個的需要進行加工處理,例如官方上給出的加工處理混淆矩陣的方法:
from sklearn.model_selection import cross_validate from sklearn.metrics import confusion_matrix # A sample toy binary classification dataset X, y = datasets.make_classification(n_classes=2, random_state=0) svm = LinearSVC(random_state=0) def tp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0] def tn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0] def fp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 0] def fn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 1] scoring = {'tp' : make_scorer(tp), 'tn' : make_scorer(tn), 'fp' : make_scorer(fp), 'fn' : make_scorer(fn)} cv_results = cross_validate(svm.fit(X, y), X, y, scoring=scoring) # Getting the test set true positive scores print(cv_results['test_tp']) [12 13 15] # Getting the test set false negative scores print(cv_results['test_fn']) [5,4,1]
二:分類指標;
針對分類,sklearn也提供了大量的度量指標,而且針對不同的分類(二分類,多分類,)有不同的度量指標,官網截圖如下:

2.1 從二分類到多分類問題:
像f1_score, roc_auc_score這種度量指標大多是針對二分類問題的,但是為了將二分類的度量指標延伸到多分類問題,數據將會看做是二分類問題的集合,也就是1vs all;有好幾種方式來平衡不同類別的二分度量,通過使用average參數來進行設置。
(1)marco(宏平均):當頻繁的類別非常重要時,宏平均可以會突出他的性能表現;當然所有的類別同樣重要是不現實的,因此宏平均可能會過度強調低頻繁類別的低性能表現;
(2)weighted(權重)
(3)micro(微平均):微平均可能在多標簽的分類問題的首選
(4)samples(樣本):僅僅當多標簽問題是可以采用。
多分類問題轉換到二分類問題就是采用1vs all的方式;
2.2 准確率(accuracy)
采用accuracy_score來計算模型的accuracy,計算公式如下:
,其中
表示預測y值,yi表示實際的y值;
## import numpy as np from sklearn.metrics import accuracy_score y_pred = [0, 2, 1, 3] y_true = [0, 1, 2, 3] accuracy_score(y_true, y_pred) 0.5 accuracy_score(y_true, y_pred, normalize=False) 2 ###多分類問題 accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2))) 0.5
2.3混淆矩陣
混淆矩陣是評價分類問題的一個典型度量,在sklearn中使用 confusion_matrix;
from sklearn.metrics import confusion_matrix y_true = [2, 0, 2, 2, 0, 1] y_pred = [0, 0, 2, 2, 0, 2] confusion_matrix(y_true, y_pred) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]])
混淆矩陣Rij(多分類)的含義表示類別i被預測成類別j的次數,因此Rii表示被正確分類的次數;
對於二分類問題,混淆矩陣用來表示TN,TP,FN,FP;
y_true = [0, 0, 0, 1, 1, 1, 1, 1] y_pred = [0, 1, 0, 1, 0, 1, 0, 1] tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel() tn, fp, fn, tp (2, 1, 2, 3)
官網上提供的混淆矩陣的可視化例子:http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py
2.4 Classification report
Classification report展示了分類的主要度量指標,如下面例子所示:
from sklearn.metrics import classification_report y_true = [0, 1, 2, 2, 0] y_pred = [0, 0, 2, 1, 0] target_names = ['class 0', 'class 1', 'class 2'] print(classification_report(y_true, y_pred, target_names=target_names)) precision recall f1-score support class 0 0.67 1.00 0.80 2 class 1 0.00 0.00 0.00 1 class 2 1.00 0.50 0.67 2 avg / total 0.67 0.60 0.59 5
2.5 Hamming loss
Hamming loss計算預測結果和實際結果的海明距離,計算公式如下:
,簡單的理解就是預測樣本中誤差總數除以樣本總數
例子:
from sklearn.metrics import hamming_loss y_pred = [1, 2, 3, 4] y_true = [2, 2, 3, 4] hamming_loss(y_true, y_pred) 0.25
##在多標簽分類中同樣適合
hamming_loss(np.array([[0, 1], [1, 1]]), np.zeros((2, 2))) 0.75
2.6 Jaccard 相關系數:
Jaccard相關系數在推薦系統中經常使用,可以理解為預測正確的個數除以總的樣本數,所以在二分類問題中,Jaccard系數和准確率是一樣的。
,
例子:
import numpy as np from sklearn.metrics import jaccard_similarity_score y_pred = [0, 2, 1, 3] y_true = [0, 1, 2, 3] jaccard_similarity_score(y_true, y_pred) 0.5 jaccard_similarity_score(y_true, y_pred, normalize=False) 2 ##multilabel jaccard_similarity_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2))) 0.75
2.7 precision,召回率,F系數等
(1)精准率(precision):正確預測為正例的樣本數占全部預測為正例的樣本數的比例;
(2)召回率:正例樣本中預測正確的樣本占實際正例樣本數量的比例;
(3)F系數同時兼顧了分類模型的准確率和召回率。F1分數可以看作是模型准確率和召回率的一種加權平均,它的最大值是1,最小值是0。
計算公式:
分數為
precision_recall_curve:准確率召回率曲線
,其中where
and
are the precision and recall at the nth threshold.



>>> from sklearn import metrics >>> y_pred = [0, 1, 0, 0] >>> y_true = [0, 1, 0, 1] >>> metrics.precision_score(y_true, y_pred) 1.0 >>> metrics.recall_score(y_true, y_pred) 0.5 >>> metrics.f1_score(y_true, y_pred) 0.66... >>> metrics.fbeta_score(y_true, y_pred, beta=0.5) 0.83... >>> metrics.fbeta_score(y_true, y_pred, beta=1) 0.66... >>> metrics.fbeta_score(y_true, y_pred, beta=2) 0.55... >>> metrics.precision_recall_fscore_support(y_true, y_pred, beta=0.5) (array([ 0.66..., 1. ]), array([ 1. , 0.5]), array([ 0.71..., 0.83...]), array([2, 2]...)) >>> import numpy as np >>> from sklearn.metrics import precision_recall_curve >>> from sklearn.metrics import average_precision_score >>> y_true = np.array([0, 0, 1, 1]) >>> y_scores = np.array([0.1, 0.4, 0.35, 0.8]) >>> precision, recall, threshold = precision_recall_curve(y_true, y_scores) >>> precision array([ 0.66..., 0.5 , 1. , 1. ]) >>> recall array([ 1. , 0.5, 0.5, 0. ]) >>> threshold array([ 0.35, 0.4 , 0.8 ]) >>> average_precision_score(y_true, y_scores) 0.83...
對於多類別分類和多標簽分類,同樣提供的一樣的評價指標;

例子:
>>> from sklearn import metrics >>> y_true = [0, 1, 2, 0, 1, 2] >>> y_pred = [0, 2, 1, 0, 0, 1] >>> metrics.precision_score(y_true, y_pred, average='macro') 0.22... >>> metrics.recall_score(y_true, y_pred, average='micro') ... 0.33... >>> metrics.f1_score(y_true, y_pred, average='weighted') 0.26... >>> metrics.fbeta_score(y_true, y_pred, average='macro', beta=0.5) 0.23... >>> metrics.precision_recall_fscore_support(y_true, y_pred, beta=0.5, average=None) ... (array([ 0.66..., 0. , 0. ]), array([ 1., 0., 0.]), array([ 0.71..., 0. , 0. ]), array([2, 2, 2]...))
2.8 ROC曲線
ROC曲線是建模結果比較熟悉的一中展示方式:
例如如下代碼:
import numpy as np from sklearn.metrics import roc_curve from sklearn.datasets import load_iris from sklearn import svm from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import train_test_split from sklearn.metrics import roc_auc_score from sklearn.metrics import roc_curve import matplotlib.pyplot as plt data=load_iris() ###兩分類 X,y=data.data,data.target X_train,X_test,y_train,y_test=train_test_split(X,y) est=svm.SVC(probability=True) model=est.fit(X_train,y_train) y_score=model.decision_function(X_test) fpr1,tpr1,thresholds1=roc_curve(y_test,model.predict_proba(X_test)[:,1],pos_label=1) lr=LogisticRegression() lr.fit(X_train,y_train) fpr2,tpr2,thresholds2=roc_curve(y_test,lr.predict_proba(X_test)[:,1],pos_label=1) plt.plot(fpr1,tpr1,linewidth=1,label='ROC of svm') plt.plot(fpr2,tpr2,linewidth=1,label='ROC of LR') plt.xlabel('FPR') plt.ylabel('TPR') plt.plot([0,1],[0,1],linestyle='--') plt.legend(loc=4) plt.show()

結果如上圖所示。
