ACC, Precision and Recall
這些概念是針對 binary classifier 而言的.
- 准確率 (accuracy) 是指分類正確的樣本占總樣本個數的比例.
- 精確率 (precision) 是指分類正確的正樣本占預測為正的樣本個數的比例. 是針對預測而言的. 在信息檢索領域稱為查准率.
- 召回率 (recall) 是指分類正確的正樣本占真正的正樣本個數的比例. 是針對樣本而言的. 在信息檢索領域稱為查全率.
為了提高 precision, 分類器會更加保守地預測正樣本, 而這往往會導致 recall 降低. 綜合考慮這兩個因素的一個辦法是繪制曲線, 另一個常用的指標是 F1 score, 它是 precision 和 recall 的調和平均值 (harmonic mean).
Curves
- P-R (Precision-Recall) 曲線. 橫軸為 recall, 縱軸為 precision. 一般來說模型將大於某一閾值的判定為正, 否則為負; 則曲線上的一個點代表在某一閾值下的 precision 和 recall.
- ROC (Receiver Operating Characteristic) 曲線. 受試者工作特征曲線. 橫軸為 FPR (False Positive Rate), 縱軸為 TPR (True Positive Rate), 其實就是 recall. ROC 曲線總是位於 \(y=x\) 的上方 (否則使預測概率 \(p \leftarrow 1-p\) 即可).
由定義易得, 對於不均衡程度不同的測試集, P-R 曲線會有大變化, 而 ROC 曲線比較穩定.
例如 TP = FP = TN = FN = 1, precision = 1/2, recall = 1/2.
將負樣本 copy 為原來的 N 倍, 則 TN = FP = N, precision = 1/(N+1), recall = 1/2, 發生了很大變動, 而 ROC 曲線不變.
- AUC (Area Under Curve). 指 ROC 曲線下的面積, 越大越好.
Misc
The ROC curve was first used during World War II for the analysis of radar signals before it was employed in signal detection theory. Following the attack on Pearl Harbor in 1941, the United States army began new research to increase the prediction of correctly detected Japanese aircraft from their radar signals. For these purposes they measured the ability of a radar receiver operator to make these important distinctions, which was called the Receiver Operating Characteristic.
References
[1] 如何解釋召回率與准確率? - 知乎. https://www.zhihu.com/question/19645541
圖片來自
Precision and recall - Wikipedia. https://en.wikipedia.org/wiki/Precision_and_recall
Receiver operating characteristic - Wikipedia. https://en.wikipedia.org/wiki/Receiver_operating_characteristic
Precision-recall curves – what are they and how are they used?. https://acutecaretesting.org/en/articles/precision-recall-curves-what-are-they-and-how-are-they-used