記正樣本為P,負樣本為N,下表比較完整地總結了准確率accuracy、精度precision、召回率recall、F1-score等評價指標的計算方式:

(右鍵點擊在新頁面打開,可查看清晰圖像)
簡單版:

precision = TP / (TP + FP) # 預測為正的樣本中實際正樣本的比例 recall = TP / (TP + FN) # 實際正樣本中預測為正的比例 accuracy = (TP + TN) / (P + N) F1-score = 2 / [(1 / precision) + (1 / recall)]
from sklearn.metrics import accuracy_score, precision_score, recall_score def cul_accuracy_precision_recall(y_true, y_pred, pos_label=1): return {"accuracy": float("%.5f" % accuracy_score(y_true=y_true, y_pred=y_pred)), "precision": float("%.5f" % precision_score(y_true=y_true, y_pred=y_pred, pos_label=pos_label)), "recall": float("%.5f" % recall_score(y_true=y_true, y_pred=y_pred, pos_label=pos_label))}
***********************************************************************************************************************************
(下面寫的內容純屬個人推導,如有錯誤,望指正)

一般來說,精度和召回率是針對具體類別來計算的,例如:
precision(c1) = TP(c1) / Pred(c1) = TP(c1) / [TP(c1) + FP(c2=>c1) + FP(c3=>c1)]
recall(c1) = TP(c1) / True(c1) = TP(c1) / [TP(c1) + FP(c1=>c2) + FP(c1=>c3)]
有時需要衡量模型的整體性能,有:
total_precision = sum[TP(ci)] / sum[Pred(ci)] = [TP(c1) + TP(c2) + TP(c3)] / len(Pred) total_recall = sum[TP(ci)] / sum[True(ci)] = [TP(c1) + TP(c2) + TP(c3)] / len(True) total_accuracy = sum[TP(ci)] / total_num = [TP(c1) + TP(c2) + TP(c3)] / total_num
其中i取值自[1,2,...,n]
到這里很驚訝地發現,針對整體而言,一般有 len(Pred) == len(True) == total_num
也就是說, total_precision == total_recall == total_accuracy ,所以衡量模型整體性能用其中一個就可以了
針對概率輸出型的的模型,很多時候會通過設置閾值梯度,得到映射關系 F(threshold) ==> (precision, recall)
在卡閾值的情況下,除了total_precision,還可以計算一個廣義召回率:
generalized_recall = sum[TP(ci)] / sum[True(ci)] = [TP(c1) + TP(c2) + TP(c3)] / [len(True) + OutOfThreshold]
其中OutOfThreshold表示因低於指定閾值而被篩選去掉的樣本數。
參考:
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
https://www.cnblogs.com/shixiangwan/p/7215926.html?utm_source=itdadao&utm_medium=referral
