sklearn.metrics 模型評估指標

本文轉載自查看原文 2020-07-08 17:06 1338 sklearn

一、分類指標

1.accuracy_score(y_true,y_pre)：准確率

總的來說就是分類正確的樣本占總樣本個數的比例，數據越大越好，

但是有一個明顯的缺陷，即是當不同類別樣本的比例非常不均衡時，占比大的類別往往成為影響准確率的最主要因素，就會出現准確率很高，但是auc卻很低的情況（樣本不均衡）

參數如下：

y_true : 一維數組，或標簽指示符 / 稀疏矩陣，實際（正確的）標簽.
y_pred : 一維數組，或標簽指示符 / 稀疏矩陣，分類器返回的預測標簽.
normalize : 布爾值, 可選的(默認為True). 如果為False，返回分類正確的樣本數量，否則，返回正確分類的得分.
sample_weight : 形狀為[樣本數量]的數組，可選. 樣本權重.

返回值：
score : 浮點型
如果normalize為True，返回正確分類的得分（浮點型），否則返回分類正確的樣本數量（整型）.
當normalize為True時，最好的表現是score為1，當normalize為False時，最好的表現是score未樣本數量.

#示例
import numpy as np
from sklearn.metrics import accuracy_score

y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
print(accuracy_score(y_true, y_pred))  # 0.5
print(accuracy_score(y_true, y_pred, normalize=False))  # 2

# 在具有二元標簽指示符的多標簽分類案例中
print(accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2))))  # 0.5

2.auc(x, y, reorder=False)

計算roc曲線下的面積auc的值

sklearn.metrics.auc(x, y)

參數：

x：fpr

y：tpr

首先要通過roc_curve計算出fpr和tpr的值，然后再metrics.auc(fpr, tpr)

返回：auc的值

3.average_precision_score(y_true, y_score, average='macro', sample_weight=None):

根據預測得分計算平均精度(AP)

其中Pn和Rn是第n個閾值處的precision和recall。對於隨機預測，AP是正樣本的比例,

該值在 0 和 1 之間，並且越高越好

注意：此實現僅限於二進制分類任務或多標簽分類任務

#官方示例
 import numpy as np
 from sklearn.metrics import average_precision_score
 y_true = np.array([0, 0, 1, 1])
 y_scores = np.array([0.1, 0.4, 0.35, 0.8])
 average_precision_score(y_true, y_scores)   #0.83...

4.brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):

Brier 分數損失

Brier 分數是一個特有的分數函數，用於衡量概率預測的准確性。它適用於預測必須將概率分配給一組相互排斥的離散結果的任務

該函數返回的是實際結果與可能結果的預測概率之間均方差的得分。實際結果必須為1或0（真或假），而實際結果的預測概率可以是0到1之間的值。

Brier 分數損失也在0到1之間，分數越低（均方差越小），預測越准確。它可以被認為是對一組概率預測的 “校准” 的度量

其中: 是預測的總數， f_t 是實際結果 o_t 的預測概率

#官方示例
 import numpy as np
 from sklearn.metrics import brier_score_loss
 y_true = np.array([0, 1, 1, 0])
 y_true_categorical = np.array(["spam", "ham", "ham", "spam"])
 y_prob = np.array([0.1, 0.9, 0.8, 0.4])
 y_pred = np.array([0, 1, 1, 0])
 brier_score_loss(y_true, y_prob)  #0.055
 brier_score_loss(y_true, 1-y_prob, pos_label=0) #0.055
 brier_score_loss(y_true_categorical, y_prob, pos_label="ham")  #0.055
 brier_score_loss(y_true, y_prob > 0.5)  #0.0

5.confusion_matrix(y_true, y_pred, labels=None, sample_weight=None):

通過計算混淆矩陣來評估分類的准確性返回混淆矩陣

#官方代碼
 from sklearn.metrics import confusion_matrix
 y_true = [2, 0, 2, 2, 0, 1]
 y_pred = [0, 0, 2, 2, 0, 2]
 confusion_matrix(y_true, y_pred)  #array([[2, 0, 0],[0, 0, 1],[1, 0, 2]])

 y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
 y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
 confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])  #array([[2, 0, 0], [0, 0, 1],[1, 0, 2]])


 tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
 (tn, fp, fn, tp)  #(0, 2, 1, 1)

返回的是TN（真的當成假的），FP（假的當成真的），FN（假的還是假的） TP （真的還是真的）

總的來說，T，F分別是真實值中的真假，P，N 分別是預測中的真假

關於類別順序可由 labels參數控制調整，例如 labels=[2,1,0],則類別將以這個順序自上向下排列。

默認數字類別是從小到大排列，英文類別是按首字母順序排列

第一個例子這樣理解：

	0（真實）	1（真實）	2（真實）
0（預測）	2	0	1
1（預測）	0	0	0
2（預測）	0	1	2

所以就有[[2,0,0],[0,0,1],[1,0,2]]

6.f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None): F1值

F1 score可以解釋為精確率和召回率的加權平均值. F1 score的最好值為1，最差值為0. 精確率和召回率對F1 score的相對貢獻是相等的

#示例
from sklearn.metrics import f1_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(f1_score(y_true, y_pred, average='macro'))  # 0.26666666666666666
print(f1_score(y_true, y_pred, average='micro'))  # 0.3333333333333333
print(f1_score(y_true, y_pred, average='weighted'))  # 0.26666666666666666
print(f1_score(y_true, y_pred, average=None))  # [0.8 0.  0. ]

average : string,[None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

這里需要注意，如果是二分類問題則選擇參數‘binary’；如果考慮類別的不平衡性，需要計算類別的加權平均，則使用‘weighted’；如果不考慮類別的不平衡性，計算宏平均，則使用‘macro’

對於類0：TP=1，FP=0，FN=1，precision=1，recall=1/2，F1-score=2/3，Weights=1/3

對於類1：TP=1，FP=2，FN=2，precision=1/3，recall=1/3，F1-score=1/3，Weights=1/2

對於類2：TP=0，FP=2，FN=1，precision=0，recall=0，F1-score=0，Weights=1/6

宏平均分數為：0.333；加權平均分數為：0.389

7.log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None, labels=None)：

又被稱為 logistic regression loss（logistic 回歸損失）或者 cross-entropy loss（交叉熵損失）定義在 probability estimates （概率估計）

#示例
from sklearn.metrics import log_loss

y_true = [0, 0, 1, 1]
y_pred = [[.9, .1], [.8, .2], [.3, .7], [.01, .99]]
log_loss(y_true, y_pred)   #0.1738

8.precision_score(y_true, y_pred, labels=None, pos_label=1, average='binary',) ：查准率或者精確度

參數
y_true : 一維數組，或標簽指示符 / 稀疏矩陣，實際（正確的）標簽.
y_pred : 一維數組，或標簽指示符 / 稀疏矩陣，分類器返回的預測標簽.
labels : 列表，默認情況下，y_true和y_pred中的所有標簽按照排序后的順序使用.
pos_label : 字符串或整型，默認為1. 如果average = binary並且數據是二進制時需要被報告的類. 若果數據是多類的或者多標簽的，這將被忽略；設置labels=[pos_label]和average != binary就只會報告設置的特定標簽的分數.
average : 字符串，可選值為[None, ‘binary’ (默認), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]. 多類或者多標簽目標需要這個參數. 如果為None，每個類別的分數將會返回. 否則，它決定了數據的平均值類型.
‘binary’: 僅報告由pos_label指定的類的結果. 這僅適用於目標（y_{true, pred}）是二進制的情況.
‘micro’: 通過計算總的真正性、假負性和假正性來全局計算指標.
‘macro’: 為每個標簽計算指標，找到它們未加權的均值. 它不考慮標簽數量不平衡的情況.
‘weighted’: 為每個標簽計算指標，並通過各類占比找到它們的加權均值（每個標簽的正例數）.它解決了’macro’的標簽不平衡問題；它可以產生不在精確率和召回率之間的F-score.
‘samples’: 為每個實例計算指標，找到它們的均值（只在多標簽分類的時候有意義，並且和函數accuracy_score不同）.
sample_weight : 形狀為[樣本數量]的數組，可選參數. 樣本權重.

返回值
precision : 浮點數(如果average不是None) 或浮點數數組, shape =[唯一標簽的數量]
二分類中正類的精確率或者在多分類任務中每個類的精確率的加權平均.

#官方代碼如下
from sklearn.metrics import precision_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(precision_score(y_true, y_pred, average='macro'))  # 0.2222222222222222
print(precision_score(y_true, y_pred, average='micro'))  # 0.3333333333333333
print(precision_score(y_true, y_pred, average='weighted'))  # 0.2222222222222222
print(precision_score(y_true, y_pred, average=None))  # [0.66666667 0.         0.        ]

9.recall_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)：召回率

recall(召回率)=TP/(TP+FN)

召回率指實際為正的樣本中被預測為正的樣本所占實際為正的樣本的比例

召回率最好的值是1，最差的值是0

#示例
from sklearn.metrics import recall_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(recall_score(y_true, y_pred, average='macro'))  # 0.3333333333333333
print(recall_score(y_true, y_pred, average='micro'))  # 0.3333333333333333
print(recall_score(y_true, y_pred, average='weighted'))  # 0.3333333333333333
print(recall_score(y_true, y_pred, average=None))  # [1. 0. 0.]

10.roc_auc_score(y_true, y_score, average='macro', sample_weight=None)：

計算ROC曲線下的面積就是AUC的值，值越大越好

#官方示例
import numpy as np
from sklearn.metrics import roc_auc_score

y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
roc_auc_score(y_true, y_scores)  #0.75

11.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)；

計算ROC曲線的橫縱坐標值，TPR，FPR：

TPR = TP/(TP+FN) = recall(真正例率，敏感度) FPR = FP/(FP+TN)(假正例率，1-特異性)

ks和auc畫圖時經常使用到

參數：

pos_label：正樣本的標簽，例如樣本label（0,1），其中1表示正樣本，則pos_label=1

drop_intermediate：是否降低一些不會出現在繪制的ROC曲線上的次優閾值。這對於創建更淺的ROC曲線很有用

#官方代碼
import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
fpr  #array([0. , 0. , 0.5, 0.5, 1. ])
tpr  #array([0. , 0.5, 0.5, 1. , 1. ])
thresholds  #array([1.8 , 0.8 , 0.4 , 0.35, 0.1 ])

補充一下混淆矩陣如下：

二、回歸指標

1.explained_variance_score(y_true, y_pred, sample_weight=None, multioutput='uniform_average')：回歸方差(反應自變量與因變量之間的相關程度)

explained_variance_score：解釋方差分，這個指標用來衡量我們模型對數據集波動的解釋程度，如果取值為1時，模型就完美，越小效果就越差

# 例子
from sklearn.metrics import explained_variance_score

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
explained_variance_score(y_true, y_pred)  #0.957


y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]
explained_variance_score(y_true, y_pred, multioutput='uniform_average')  #0.983

2.mean_absolute_error(y_true, y_pred, sample_weight=None, multioutput='uniform_average')：平均絕對誤差

即是真實值和預測值的差的絕對值的平均值
給定數據點的平均絕對誤差，一般來說取值越小，模型的擬合效果就越好
缺點：若本身真實值就比較大，比如真實值1萬，預測值9000，但mae=1000就會造成MAE是比較大的,如真實值10，預測值9，但mae=1，所以不能單獨從MAE的值來說明網絡預測能力的好壞

#官方例子
from sklearn.metrics import mean_absolute_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_absolute_error(y_true, y_pred)  #0.5

y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]
mean_absolute_error(y_true, y_pred)  #0.75

mean_absolute_error(y_true, y_pred, multioutput='raw_values') #array([0.5, 1. ])
mean_absolute_error(y_true, y_pred, multioutput=[0.3, 0.7]) #0.85

參數解釋：multioutput可選2種方式：
row_values:返回完整的錯誤集；
uniform_average:輸出的誤差均以相同的權重

3.mean_squared_error(y_true, y_pred, sample_weight=None, multioutput='uniform_average')：均方誤差

反映預測值與真實值之間差異程度的一種度量，換句話說，預測值與真實值差的平方的期望值。MSE可以評價數據的變化程度，MSE的值越小，說明預測模型的預測能力越好

#官方示例
from sklearn.metrics import mean_squared_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_squared_error(y_true, y_pred)  #0.375
mean_squared_error(y_true, y_pred, squared=False) #0.612

y_true = [[0.5, 1],[-1, 1],[7, -6]]
y_pred = [[0, 2],[-1, 2],[8, -5]]
mean_squared_error(y_true, y_pred) #0.708
mean_squared_error(y_true, y_pred, multioutput='raw_values') #array([0.41666667, 1.        ])
mean_squared_error(y_true, y_pred, multioutput=[0.3, 0.7]) #0.825

4.median_absolute_error(y_true, y_pred) 中位數絕對誤差

得到的值越小越好

 from sklearn.metrics import median_absolute_error


 y_true = [3, -0.5, 2, 7]
 y_pred = [2.5, 0.0, 2, 8]
 median_absolute_error(y_true, y_pred) #0.5


 y_true = [[0.5, 1], [-1, 1], [7, -6]]
 y_pred = [[0, 2], [-1, 2], [8, -5]]
 median_absolute_error(y_true, y_pred) #0.75


 median_absolute_error(y_true, y_pred, multioutput='raw_values') #array([0.5, 1. ])
 median_absolute_error(y_true, y_pred, multioutput=[0.3, 0.7]) #0.85

5.r2_score(y_true, y_pred, sample_weight=None, multioutput='uniform_average') ：決定系數，R平方

也等於

殘差平方和：SSE，即估計值與真實值的誤差，反映模型擬合程度
總離差平方和：SST，即平均值與真實值的誤差，反映與數學期望的偏離程度
R方可以理解為因變量y中的變異性能能夠被估計的多元回歸方程解釋的比例，它衡量各個自變量對因變量變動的解釋程度，其取值在0與1之間，其值越接近1，則變量的解釋程度就越高，其值越接近0，其解釋程度就越弱。可以通俗地理解為使用均值作為誤差基准，看預測誤差是否大於或者小於均值基准誤差
一般來說，增加自變量的個數，回歸平方和會增加，殘差平方和會減少，所以R方會增大；反之，減少自變量的個數，回歸平方和減少，殘差平方和增加。
經驗值：>0.4，擬合效果好

R2_score = 1，樣本中預測值和真實值完全相等，沒有任何誤差，表示回歸分析中自變量對因變量的解釋越好。
R2_score = 0。此時分子等於分母，樣本的每項預測值都等於均值。
R2_score不是r的平方，也可能為負數(分子>分母)，模型等於盲猜，還不如直接計算目標變量的平均值

缺點：數據集的樣本越大，R²越大，因此，不同數據集的模型結果比較會有一定的誤差

#官方示例
 from sklearn.metrics import r2_score

 y_true = [3, -0.5, 2, 7]
 y_pred = [2.5, 0.0, 2, 8]
 r2_score(y_true, y_pred)  #0.948...

 y_true = [[0.5, 1], [-1, 1], [7, -6]]
 y_pred = [[0, 2], [-1, 2], [8, -5]]
 r2_score(y_true, y_pred,multioutput='variance_weighted')  #0.938...

 y_true = [1, 2, 3]
 y_pred = [1, 2, 3]
 r2_score(y_true, y_pred)  #1.0

 y_true = [1, 2, 3]
 y_pred = [2, 2, 2]
 r2_score(y_true, y_pred)  #0.0

 y_true = [1, 2, 3]
 y_pred = [3, 2, 1]
 r2_score(y_true, y_pred)  #-3.0

6.mean_squared_log_error(y_true, y_pred)

當目標實現指數增長時，例如人口數量、一種商品在幾年時間內的平均銷量等，這個指標最適合使用。請注意，這個指標懲罰的是一個被低估的估計大於被高估的估計

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 sklearn.metrics【指標】 sklearn——metrics模型評估指標 sklearn.metrics中的評估方法 sklearn.metrics中的評估方法基於sklearn的metrics庫的常用有監督模型評估指標學習 sklearn.metrics中的confusion_matrix、ROC、AUC指標查看sklearn中所有的模型評估指標 sklearn之模型評估指標總結歸納 Python Sklearn.metrics 簡介及應用示例 sklearn.metrics.classification_report分類模型評估