機器學習評判指標

本文轉載自查看原文 2017-11-22 18:09 3080 machine learning

0.背景

機器學習通常評判一個算法的好壞，是基於不同場景下采用不同的指標的。通常來說，有：

[x] 准確度；PR (Precision Recall)；
[x] F測量；
[ ] MCC；
[ ] BM；
[ ] MK；
[ ] Gini系數；
[x] ROC；
[ ] Z score；
[x] AUC ；
[ ] Cost Curve；
[ ] BLEU；
[ ] Matthews correlation coefficient；
[ ] METEOR；
[ ] Brier score；
[ ] NIST (metric)；
[ ] ROUGE (metric)；
[ ] Sørensen–Dice coefficient；
[ ] Uncertainty coefficient, aka Proficiency；
[ ] Word error rate (WER)；

從wiki獲取一個很重要的二分類混淆矩陣來說明后續的內容。

圖0.1 wiki上的圖
圖0.1為wiki上針對2分類的一個混淆矩陣，及對應的各種指標表示。其中：

true condition：列表示真實類別；predicted condition：行表示預測的類別；

真實正類=true positive+false negative；真實負類=false positive+true negative；

預測的正類=true positive+false positive； 預測的負類=false negative+true negative。

1. 不同指標的含義

1.1 accuracy&Precision Recall

如圖0.1所示:

accuracy：（圖0.1中ACC）即最常用的准確度，表示\(\frac{所有預測對了的樣本個數}{總的樣本個數}\)；

Precision：（圖0.1中PPV），精確率，表示預測的正類中預測對的樣本個數比例\(\frac{true\, positive}{預測的正類}\)；

Recall：（圖0.1中TPR），召回率，表示真實正類中預測對的樣本個數比例\(\frac{true\, positive}{真實正類}\).

1.2 F measure&&G measure

1.2.1 F measure

傳統的F measure（balanced F score，\(F_1\) score）就是關於precision和recall的Harmonic均值（是數學上一種均值算法），其公式如下：

其中：

當F score為0的時候最差：即precision和recall中某個值或者都接近0，則該模型越差；

當F score為1的時候最好：即precision和recall同時越接近1則該模型越好。

ps：F1 score同樣也被稱為Sørensen–Dice coefficient或者說叫Dice similarity coefficient (DSC).

將上述式子表示成更通用的形式如下圖：

其中\(F_2\)，\(F_{0.5}\)是相對\(F_1\)兩個常用的F measure：

當\(\beta=2\)，則表示recall的影響要大於precision；

當\(\beta=0.5\)，則表示precision的影響要大於recall.

如果以圖0.1中的type I error和type II error來表示F measure，則如下面式子：

1.2.2 G measure

相對於F measure 是一種Harmonic均值，G measure是一種geometric mean,同時也被稱為 Fowlkes–Mallows index
即

1.3 PR Curve

針對2分類，以Recall為橫軸，Precision為縱軸的曲線。如圖2.1.1

1.4 Cost Curve

1.5 ROC

針對2分類，以圖0.1中FPR為橫軸，圖0.1中TPR（也就是Recall）為縱軸的曲線。如圖2.1.1

AUC：Aera under curve，即表示曲線下面積的意思

2. 不同指標之間的關系

早在1998年，Provost等人就認為簡單的使用accuracy去評價算法的性能是完全不夠的，因為會出現acc很高，而算法卻相對是差的情況。所以他們推薦使用ROC來對算法進行評價。

2.1. PRC和ROC之間的關系

當不同類別中樣本的個數差別很大的時候，ROC曲線是無法正確的描述算法性能的，假如2分類中負類特別多，那么當圖0.1中FP變化很大時，在ROC上橫坐標表示的FPR上表現的就不那么明顯；而precision是通過FP與TP之間的對比而不是FP和TN之間的對比，從而如果FP變化很大的時候，precision就會變得很敏感了，從而能夠抓取到當負類個數遠大於正類時候算法性能的影響了。Jesse Davis以及前人就通過PRC來代替ROC進行算法性能描述。而這兩種曲線之間一個很重要的區別就在於視覺上的體現，如圖2.1.1所示。

圖2.1.1 PR與ROC的曲線圖
圖2.1.1是基於同一個高度不平衡的數據集和同樣的2個算法基礎上，得到的ROC和PRC。ROC表示的算法當在拐角越接近左上角越好（即<左下-左上-右上>這樣的順序）；PRC表示的算法在拐角越接近右上角越好（即<左上-右上-右下>這樣的順序）。從中可以看出PRC更能給人以算法2要好於算法1的感覺。而ROC中雖然算法2的AUC更大，可是整體給人以這兩種算法都很好了，只有細微差別的感覺。所以PRC不但可以放大2個算法之間的差別，還能看出這兩種算法的發展空間還是很大的。

對於任何數據集來說（即有固定數據的正負類樣本個數），基於同一個算法，PRC和ROC都包含了相同的點，也就是這兩個曲線是具有等效性的，然而也保證了ROC中一個算法占優那么有且僅有該算法在PRC中也占優；可是如果一個算法在ROC中占優，卻並不能保證其在PRC中也是占優的（即PRC占優可以推出該算法在ROC中也占優，而ROC中占優不代表其在PRC中占優）

2.2 ROC與CC（cost curves）之間的關系

如果不同類別中樣本個數存在較大的偏差，則ROC曲線可能對算法的性能過分的樂觀。Drummond等人推薦使用CC來替代ROC進行算法評價

2.3 AUC的探討

參考文獻：

[ROC繪制] .introduction-to-auc-and-roc
[F1] wiki.F1_score
[ROC] J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982.
[ROC] Hanley J A, McNeil B J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases[J]. Radiology, 1983, 148(3): 839-843.
[ROC] McNeil B J, Hanley J A. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves[J]. Medical decision making, 1984, 4(2): 137-150.
[ROC] DeLong E R, DeLong D M, Clarke-Pearson D L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach[J]. Biometrics, 1988: 837-845.
[ROC] Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145{1159.
[ACC&ROC] Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. Proceeding of the 15th International Conference on Machine Learning (pp. 445{453). Morgan Kaufmann, San Francisco, CA.
[ROC&CC] Chris Drummond and Robert C. Holte, ‘Explicitly representing expected cost: An alternative to roc representation’, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–207, (2000).
[ROC] wiki.Receiver_operating_characteristic
[AUC] Cortes, C., & Mohri, M. (2003). AUC optimization vs. error rate minimization. Neural Information Processing Systems 15 (NIPS). MIT Press
[ROC&CC] Drummond, C., & Holte, R. C. (2004). What ROC curves can't do (and cost curves can). ROCAI (pp. 19{26).
[ROC] Zhang, Jun; Mueller, Shane T. (2005). "A note on ROC analysis and non-parametric estimate of sensitivity". Psychometrika. 70: 203–212.
[ROC] Fan J, Upadhye S, Worster A. Understanding receiver operating characteristic (ROC) curves[J]. Canadian Journal of Emergency Medicine, 2006, 8(1): 19-20.
[ROC] Fawcett, Tom (2006). An Introduction to ROC Analysis. Pattern Recognition Letters. 27 (8): 861–874.
[PR&ROC] .The Relationship Between Precision-Recall and ROC Curves, Jesse Davis and Mark Goadrich, ICML 2006
[ROC] Brown C D, Davis H T. Receiver operating characteristics curves and related decision measures: A tutorial[J]. Chemometrics and Intelligent Laboratory Systems, 2006, 80(1): 24-38.
[ROC] Weng C G, Poon J. A new evaluation measure for imbalanced datasets[C]//Proceedings of the 7th Australasian Data Mining Conference-Volume 87. Australian Computer Society, Inc., 2008: 27-32.
[ROC] Powers, David M W (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation . Journal of Machine Learning Technologies. 2 (1): 37–63.
[ROC] Flach P A. ROC analysis[M]//Encyclopedia of machine learning. Springer US, 2011: 869-875.
[ROC] Hernandez-Orallo, J. (2013). "ROC curves for regression". Pattern Recognition. 46 (12): 3395–3411 .
[ROC] .Using the Receiver Operating Characteristic (ROC) curve to analyze a classification model: A final note of historical interest. Department of Mathematics, University of Utah. Department of Mathematics, University of Utah. Retrieved May 25, 2017.
[CC] Drummond C, Holte R C. Cost curves: An improved method for visualizing classifier performance[J]. Machine learning, 2006, 65(1): 95-130.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習評價指標機器學習算法評價指標機器學習分類算法評價指標機器學習評價指標大匯總機器學習模型評估指標總結機器學習模型評估指標匯總 (一）機器學習模型評估指標匯總 (二）機器學習評價指標整理機器學習面試--算法評價指標機器學習--評估指標篇