sklearn中的metrics模塊中的Classification metrics

本文轉載自查看原文 2018-04-17 19:29 1954 機器學習/ Python

metrics是sklearn用來做模型評估的重要模塊，提供了各種評估度量，現在自己整理如下：

一.通用的用法：Common cases: predefined values

1.1 sklearn官網上給出的指標如下圖所示：

1.2除了上圖中的度量指標以外，你還可以自定義一些度量指標：通過sklearn.metrics.make_scorer()方法進行定義；

make_scorer有兩種典型的用法：

用法一：包裝一些在metrics中已經存在的的方法，但是這種方法需要一些參數，例如fbeta_score方法，官網上給出的用法如下：

from sklearn.metrics import fbeta_score, make_scorer
ftwo_scorer = make_scorer(fbeta_score, beta=2)
from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC
grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)

第二種用法是完全的定義用戶自定義的函數，可以接受一下幾種參數：

　　1.你想用的python函數；

　　2、不論你提供的python函數返回的是socre或者是loss，if score：the higher the better；if loss： the lower the better

　　3、只為分類度量：無論你提供的python方法是不是需要連續的決策因素，默認為否

　　4.其他額外的參數，例如f1_score中的beta或者labels

例如官網上給出的例子：

import numpy as np
def my_custom_loss_func(ground_truth, predictions):
    diff = np.abs(ground_truth - predictions).max()
    return np.log(1 + diff)

# loss_func will negate the return value of my_custom_loss_func,
#  which will be np.log(2), 0.693, given the values for ground_truth
#  and predictions defined below.
loss  = make_scorer(my_custom_loss_func, greater_is_better=False)
score = make_scorer(my_custom_loss_func, greater_is_better=True)
ground_truth = [[1], [1]]
predictions  = [0, 1]
from sklearn.dummy import DummyClassifier
clf = DummyClassifier(strategy='most_frequent', random_state=0)
clf = clf.fit(ground_truth, predictions)
loss(clf,ground_truth, predictions) 
-0.69...
score(clf,ground_truth, predictions) 
0.69...

1.3 使用多重度量標准

sklearn也接受在GridSearchCV，RandomizedSearchCV和Cross_validate中接受多指標，有兩種指定方式；

##方式1： 使用字符串的list    
scoring=['accuracy','precision']

### f方式2：使用dict進行mapping
from sklearn.metrics import accuracy_score
from sklearn.metrics import make_scorer
scoring = {'accuracy': make_scorer(accuracy_score),
           'prec': 'precision'}

注意：目前方式2（dict）模式只允許返回單一score的score方法，返回多個的需要進行加工處理，例如官方上給出的加工處理混淆矩陣的方法：

from sklearn.model_selection import cross_validate
from sklearn.metrics import confusion_matrix
# A sample toy binary classification dataset
X, y = datasets.make_classification(n_classes=2, random_state=0)
svm = LinearSVC(random_state=0)
def tp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0]
def tn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0]
def fp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 0]
def fn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 1]
scoring = {'tp' : make_scorer(tp), 'tn' : make_scorer(tn),
           'fp' : make_scorer(fp), 'fn' : make_scorer(fn)}
cv_results = cross_validate(svm.fit(X, y), X, y, scoring=scoring)
# Getting the test set true positive scores
print(cv_results['test_tp'])          
[12 13 15]
# Getting the test set false negative scores
print(cv_results['test_fn']) 
[5,4,1]

二：分類指標；

針對分類，sklearn也提供了大量的度量指標，而且針對不同的分類（二分類，多分類，）有不同的度量指標，官網截圖如下：

2.1 從二分類到多分類問題：

像f1_score, roc_auc_score這種度量指標大多是針對二分類問題的，但是為了將二分類的度量指標延伸到多分類問題，數據將會看做是二分類問題的集合，也就是1vs all；有好幾種方式來平衡不同類別的二分度量，通過使用average參數來進行設置。

（1）marco（宏平均）：當頻繁的類別非常重要時，宏平均可以會突出他的性能表現；當然所有的類別同樣重要是不現實的，因此宏平均可能會過度強調低頻繁類別的低性能表現；

（2）weighted（權重）

（3）micro（微平均）：微平均可能在多標簽的分類問題的首選

（4）samples（樣本）：僅僅當多標簽問題是可以采用。

多分類問題轉換到二分類問題就是采用1vs all的方式；

2.2 准確率（accuracy）

　　采用accuracy_score來計算模型的accuracy，計算公式如下：

　　　　 $\texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} 1(\hat{y}_i = y_i)$ ，其中 $\hat{y}_i$ 表示預測y值，yi表示實際的y值；

##
import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
0.5
accuracy_score(y_true, y_pred, normalize=False)
2
###多分類問題
accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2)))
0.5

2.3混淆矩陣

混淆矩陣是評價分類問題的一個典型度量，在sklearn中使用 confusion_matrix；

from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

混淆矩陣Rij（多分類）的含義表示類別i被預測成類別j的次數，因此Rii表示被正確分類的次數；

對於二分類問題，混淆矩陣用來表示TN，TP，FN，FP；

y_true = [0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 1, 0, 1, 0, 1, 0, 1]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
tn, fp, fn, tp
(2, 1, 2, 3)

官網上提供的混淆矩陣的可視化例子：http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py

2.4 Classification report

Classification report展示了分類的主要度量指標，如下面例子所示：

from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 1, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
             precision    recall  f1-score   support

    class 0       0.67      1.00      0.80         2
    class 1       0.00      0.00      0.00         1
    class 2       1.00      0.50      0.67         2

avg / total       0.67      0.60      0.59         5

2.5 Hamming loss

Hamming loss計算預測結果和實際結果的海明距離，計算公式如下：

$L_{Hamming}(y, \hat{y}) = \frac{1}{n_\text{labels}} \sum_{j=0}^{n_\text{labels} - 1} 1(\hat{y}_j \not= y_j)$ ，簡單的理解就是預測樣本中誤差總數除以樣本總數

例子：

from sklearn.metrics import hamming_loss
y_pred = [1, 2, 3, 4]
y_true = [2, 2, 3, 4]
hamming_loss(y_true, y_pred)
0.25

##在多標簽分類中同樣適合
hamming_loss(np.array([[0, 1], [1, 1]]), np.zeros((2, 2))) 0.75

2.6 Jaccard 相關系數：

Jaccard相關系數在推薦系統中經常使用，可以理解為預測正確的個數除以總的樣本數，所以在二分類問題中，Jaccard系數和准確率是一樣的。

$J(y_i, \hat{y}_i) = \frac{|y_i \cap \hat{y}_i|}{|y_i \cup \hat{y}_i|}.$ ，

例子：

import numpy as np
from sklearn.metrics import jaccard_similarity_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
jaccard_similarity_score(y_true, y_pred)
0.5
jaccard_similarity_score(y_true, y_pred, normalize=False)
2
##multilabel
jaccard_similarity_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2)))
0.75

2.7 precision，召回率，F系數等

（1）精准率（precision）：正確預測為正例的樣本數占全部預測為正例的樣本數的比例；

（2）召回率：正例樣本中預測正確的樣本占實際正例樣本數量的比例；

（3）F系數同時兼顧了分類模型的准確率和召回率。F1分數可以看作是模型准確率和召回率的一種加權平均，它的最大值是1，最小值是0。

計算公式：

又稱平衡F分數（balanced F Score），它被定義為精確率和召回率的調和平均數。公式為：

；

更一般的，我們定義

分數為

(4). precision_recall_curve:准確率召回率曲線

(5).average_precision_score:計算平均准確率（AP）,結果在0-1之間，越大越好：

計算公式：

$\text{AP} = \sum_n (R_n - R_{n-1}) P_n$ ，其中where $P_n$ and $R_n$ are the precision and recall at the nth threshold.

對於二分類問題，計算公式：

$\text{precision} = \frac{tp}{tp + fp},$

$\text{recall} = \frac{tp}{tp + fn},$

$F_\beta = (1 + \beta^2) \frac{\text{precision} \times \text{recall}}{\beta^2 \text{precision} + \text{recall}}.$

小例子：

>>> from sklearn import metrics
>>> y_pred = [0, 1, 0, 0]
>>> y_true = [0, 1, 0, 1]
>>> metrics.precision_score(y_true, y_pred)
1.0
>>> metrics.recall_score(y_true, y_pred)
0.5
>>> metrics.f1_score(y_true, y_pred)  
0.66...
>>> metrics.fbeta_score(y_true, y_pred, beta=0.5)  
0.83...
>>> metrics.fbeta_score(y_true, y_pred, beta=1)  
0.66...
>>> metrics.fbeta_score(y_true, y_pred, beta=2) 
0.55...
>>> metrics.precision_recall_fscore_support(y_true, y_pred, beta=0.5)  
(array([ 0.66...,  1.        ]), array([ 1. ,  0.5]), array([ 0.71...,  0.83...]), array([2, 2]...))


>>> import numpy as np
>>> from sklearn.metrics import precision_recall_curve
>>> from sklearn.metrics import average_precision_score
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> precision, recall, threshold = precision_recall_curve(y_true, y_scores)
>>> precision  
array([ 0.66...,  0.5       ,  1.        ,  1.        ])
>>> recall
array([ 1. ,  0.5,  0.5,  0. ])
>>> threshold
array([ 0.35,  0.4 ,  0.8 ])
>>> average_precision_score(y_true, y_scores)  
0.83...

對於多類別分類和多標簽分類，同樣提供的一樣的評價指標；

例子：

>>> from sklearn import metrics
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> metrics.precision_score(y_true, y_pred, average='macro')  
0.22...
>>> metrics.recall_score(y_true, y_pred, average='micro')
... 
0.33...
>>> metrics.f1_score(y_true, y_pred, average='weighted')  
0.26...
>>> metrics.fbeta_score(y_true, y_pred, average='macro', beta=0.5)  
0.23...
>>> metrics.precision_recall_fscore_support(y_true, y_pred, beta=0.5, average=None)
... 
(array([ 0.66...,  0.        ,  0.        ]), array([ 1.,  0.,  0.]), array([ 0.71...,  0.        ,  0.        ]), array([2, 2, 2]...))

2.8 ROC曲線

ROC曲線是建模結果比較熟悉的一中展示方式：

例如如下代碼：

import numpy as np
from sklearn.metrics import roc_curve
from sklearn.datasets import load_iris
from sklearn import svm
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

data=load_iris()
###兩分類
X,y=data.data,data.target


X_train,X_test,y_train,y_test=train_test_split(X,y)
est=svm.SVC(probability=True)
model=est.fit(X_train,y_train)
y_score=model.decision_function(X_test)
fpr1,tpr1,thresholds1=roc_curve(y_test,model.predict_proba(X_test)[:,1],pos_label=1)

lr=LogisticRegression()
lr.fit(X_train,y_train)
fpr2,tpr2,thresholds2=roc_curve(y_test,lr.predict_proba(X_test)[:,1],pos_label=1)

plt.plot(fpr1,tpr1,linewidth=1,label='ROC of svm')
plt.plot(fpr2,tpr2,linewidth=1,label='ROC of LR')
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.plot([0,1],[0,1],linestyle='--')
plt.legend(loc=4)
plt.show()

結果如上圖所示。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Sklearn.metrics類的學習筆記----Classification metrics 量化預測質量之分類報告 sklearn.metrics.classification_report sklearn.metrics【指標】 sklearn.metrics.roc_curve sklearn 中 make_blobs模塊使用 sklearn.metrics.pairwise.rbf_kernel 【集成學習】sklearn中xgboost模塊的XGBClassifier函數 sklearn dataset 模塊學習基於sklearn的metrics庫的常用有監督模型評估指標學習 sklearn.metrics.roc_curve使用說明