目標檢測評價指標(mAP)

本文轉載自查看原文 2018-04-02 21:15 12950 圖像處理/ 深度學習/ 目標檢測/ VOC

常見指標

precision 預測出的所有目標中正確的比例 (true positives / true positives + false positives).
recall 被正確定位識別的目標占總的目標數量的比例 (true positives/(true positives + false negatives)).

一般情況下模型不夠理想，准確率高、召回率低，或者召回率低、准確率高。如果做疾病監測、反垃圾，則是保准確率的條件下，提升召回率。如果是做搜索，那就是保證召回的情況下提升准確率。^[1]

Precision和Recall的計算圖示如下:

precision-recall

以行人檢測為例，精度就是檢測出來的行人中確實是行人的所占的百分比；Recall就是正確檢出的行人數量占行人總數的百分比，Recall=100%表示沒有漏檢；這兩個常常是一對矛盾，通常我們總是希望既沒有虛景也不會發生漏檢的情況，也就是Precision和Recall均為100%的狀況。

F1 score 通常我們使用precision和recall兩個指標來衡量模型的好壞,但是同時要權衡這兩個量,影響我們做決策的速度.可以使用F1 score來組合這兩個量(又稱F score,F measure,名稱F沒有什么意義):

\[F_1\ \text{Score} ={2\over {1\over P}+{1\over R}}= 2{PR\over P+R}\in[0,1] \]

F1 score 是 \(F=1/(\lambda{1\over P}+(1-\lambda){1\over R})\) 的簡化版(\(\lambda=0.5\)).
P和R的值均是越大越好,因此F1 score也越大結果越好.

AUC ROC 曲線下的面積,面積越大,分類效果越好.ROC橫軸為假正率(FP,false positive),縱軸為真正率(TP,true positive)。通過給分類器設置不同的置信度閾值得到多組（FP,TP）數據繪制成ROC 曲線。ROC 曲線如下圖所示：

ROC curve

AUC 的含義：AUC值是一個概率值，當你隨機挑選一個正樣本以及一個負樣本，當前的分類算法根據計算得到的Score值將這個正樣本排在負樣本前面的概率就是AUC值。當然，AUC值越大，當前的分類算法越有可能將正樣本排在負樣本前面，即能夠更好的分類。

為什么使用ROC曲線？
評價標准已有很多，為什么還要使用ROC和AUC呢？因為ROC曲線有個很好的特性：當測試集中的正負樣本的分布變化的時候，ROC曲線能夠保持不變。在實際的數據集中經常會出現類不平衡（class imbalance）現象，即負樣本比正樣本多很多（或者相反），而且測試數據中的正負樣本的分布也可能隨着時間變化。^[2]

loss_bbox 預測邊框和真實邊框的坐標之間的差別,如采用smooth L1 loss計算.
mAP 對於每一類計算平均精度(AP,average precision),然后計算所有類的均值。mAP 綜合考量了P、R，解決P，R的單點值局限性。PR曲線與ROC曲線類似，曲線下面積越大越好，因此我們定義PR曲線下面積為：

\[\rm{mAP} = \int_0^1 P(R)dR \]

當然,這種積分只是一種理想的計算方式, 實際中可采用 Approximated Average precision: \(\sum_{k=1}^N P(k)\Delta r(k)\). 表示當識別出k張圖片(或目標)時准確率與召回率的變化量(從k-1變化到k)的乘積累加和.
另一種度量性能的標准：Interpolated Average Precision。這一新的算法不再使用P(k)，而是使用：\(\sum_{k=1}^N \max_{\tilde{k}\ge k}P(\tilde{k})\Delta r(k)\).
使用Interpolated Average Precision算出的Average Precision值明顯要比Approximated Average Precision的方法算出的要高。
很多文獻都是用Interpolated Average Precision 作為度量方法，並且直接稱算出的值為Average Precision 。PASCAL Visual Objects Challenge從2007年開始就是用這一度量制度，他們認為這一方法能有效地減少Precision-recall 曲線中的抖動。^[1:1]

mAP (VOC)

在論文中常看到top-5 xxx, mAP@0.5等,下面對此作出解釋.

\[AP = {1\over 11}\sum_{r\in \{0,0.1,...,1\}} P_{interp}(r) \\ \text{mAP} = {\sum_{q=1}^Q AP(q)\over Q} \]

首先用訓練好的模型得到所有測試樣本的confidence score.接下來對confidence score排序，然后計算precision和recall.
計算precision和recall時可以只計算按confidence score排序后top-n的樣本,稱為top-n precision/recall.
實際多類別分類任務中，我們通常不滿足只通過top-5來衡量一個模型的好壞，而是需要知道從top-1到top-N（N是所有測試樣本個數，本文中為20）對應的precision和recall。顯然隨着我們選定的樣本越來也多，recall一定會越來越高，而precision整體上會呈下降趨勢。把recall當成橫坐標，precision當成縱坐標，即可得到常用的precision-recall曲線。precision-recall曲線如下：

P-R-curv

接下來說說AP的計算，此處參考的是PASCAL VOC challenge 的計算方法。首先設定一組閾值，[0, 0.1, 0.2, …, 1]。然后對於recall大於每一個閾值（比如recall>0.3），我們都會得到一個對應的最大precision。這樣，我們就計算出了11個precision。AP即為這11個precision的平均值。這種方法英文叫做11-point interpolated average precision(差值平均精度)。具體流程為：

對於類別C，首先將算法輸出的所有C類別的預測框，按置信度排序；
設定不同的k值，選擇top k個預測框，計算FP和TP，使得 recall 分別等於0，0.1，0.2，0.3，0.4，0.5，0.6，0.7，0.8，0.9，1.0；
計算Precision；
將得到的11個Precision取平均，即得到AP；
AP是針對單一類別的，mAP是將所有類別的AP求和，再取平均：
mAP = 所有類別的AP之和 / 類別的總個數

PASCAL VOC challenge 自2010年后換了另一種計算方法,提高了計算的精度,能更好地區分低准確率的方法。新的計算方法假設這N個樣本中有M個正例，那么我們會得到M個recall值,對於每個recall值r，我們可以計算出對應（r' > r）的最大precision，然后對這M個precision值取平均即得到最后的AP值。參考voc2012/devkit_doc.
改進后曲線單調遞減:

PR-curv2

改進后的 mAP 值通常比 VOC07 的方法得到的 mAP 高一些（一般5%以內），但也存在變低的情況。因為計算的是曲線下的面積，有時也寫作ap_auc。可參考下文的代碼進行理解。

AP衡量的是學出來的模型在每個類別上的好壞，mAP是取所有類別AP的平均值，衡量的是在所有類別上的平均好壞程度。

AP(COCO)^[3]

在 MSCOCO 競賽中\(AP^{50}\)或者AP@0.5指的是當設置IoU為0.5時的平均准確率。\(AP^{75}\) 是嚴格模式的測量. \(AP^\rm{small},AP^\rm{medium},AP^\rm{large}\) 分別對應面積 \(area <32^2,32^2 < area < 96^2,area > 96^2\) 的目標分別測試的值.

mAP@[.5:.95]是在不同 IoU (從 0.5 到 0.95, 步長0.05) 設置下的平均值,又可寫作mmAP或 AP, 參考cocoeval.py.

mAP 代碼參考

VOC 目標檢測評估函數

function [rec,prec,ap] = VOCevaldet(VOCopts,id,cls,draw)

% npos 為ground truth objects的數目, 計算中會忽略difficult的樣本
% sort detections by decreasing confidence  
[sc,si]=sort(-confidence);   %按照score降序排序  
ids=ids(si);  
BB=BB(:,si);  
  
% assign detections to ground truth objects  
nd=length(confidence);  
tp=zeros(nd,1);  
fp=zeros(nd,1);  
tic;  
for d=1:nd  
    % find ground truth image  
    i=strmatch(ids{d},gtids,'exact');  
    if isempty(i)  
        error('unrecognized image "%s"',ids{d});  
    elseif length(i)>1  
        error('multiple image "%s"',ids{d});  
    end  
  
    % assign detection to ground truth object if any  
    bb=BB(:,d);  
    ovmax=-inf;  
    for j=1:size(gt(i).BB,2)  
        bbgt=gt(i).BB(:,j);  
        bi=[max(bb(1),bbgt(1)) ; max(bb(2),bbgt(2)) ; min(bb(3),bbgt(3)) ; min(bb(4),bbgt(4))];  
        iw=bi(3)-bi(1)+1;  
        ih=bi(4)-bi(2)+1;  
        if iw>0 & ih>0                  
            % compute overlap as area of intersection / area of union  
            ua=(bb(3)-bb(1)+1)*(bb(4)-bb(2)+1) + (bbgt(3)-bbgt(1)+1)*(bbgt(4)-bbgt(2)+1) - iw*ih;
            ov=iw*ih/ua;  
            if ov>ovmax  
                ovmax=ov;  
                jmax=j;  
            end  
        end  
    end  
    % assign detection as true positive/don't care/false positive  
    if ovmax>=VOCopts.minoverlap  
        if ~gt(i).diff(jmax)  
            if ~gt(i).det(jmax)  
                tp(d)=1;            % true positive  
                gt(i).det(jmax)=true;  
            else  
                fp(d)=1;            % false positive (multiple detection)   % 若多個目標框對應同一個gt，則將后續(低score的)目標設為FP
            end  
        end  
    else  
        fp(d)=1;                    % false positive  
    end  
end  

% 由於目標框按score排序,設置不同的score閾值可以得到不同的P/R, 因此可以通過累加操作計算
fp=cumsum(fp);  
tp=cumsum(tp);  
rec=tp/npos;  
prec=tp./(fp+tp);  

% VOC2007: compute 11 point average precision  
ap=0;  
for t=0:0.1:1  
    p=max(prec(rec>=t));  
    if isempty(p)  
        p=0;  
    end  
    ap=ap+p/11;  
end  

% VOC2012: 
ap_new = VOCap(rec,prec);

% VOCap計算方式如下:
function ap = VOCap(rec,prec)

mrec=[0 ; rec ; 1]; % 在召回率列表首尾添加兩個值
mpre=[0 ; prec ; 0];
for i=numel(mpre)-1:-1:1
    mpre(i)=max(mpre(i),mpre(i+1)); % 使mpre單調遞減
end
i=find(mrec(2:end)~=mrec(1:end-1))+1; % 找出召回率產生變化的下標
ap=sum((mrec(i)-mrec(i-1)).*mpre(i)); % 計算ROC曲線下面積

重疊度（IoU,Intersect over Union）:

因為我們算法不可能百分百跟人工標注的數據完全匹配，因此就存在一個對於bounding box的定位精度評價公式：IoU。它定義了兩個bounding box的重疊度，如下圖所示

IOU

IoU就是矩形框A、B的重疊面積占A、B並集的面積比例(\(A\bigcap B\over A\bigcup B\))。
這與Jaccard相似度定義類似: \(J(A,B)={|A\cap B| \over |A\cup B|}\)

通常我們認為:
• Correct: 類別正確且 IoU > .5
• Localization: 類別正確, .1 < IoU < .5
• Similar: 類別近似, IoU > .1
• Other: 類別錯誤, IoU > .1
• Background: IoU < .1 的任意目標

python代碼實現

# Calculate Intersect over usion between boxes b1 and b2, here each box is defined with 2 points
# box(startX, startY, endX, endY), there are other definitions ie box(x,y,width,height)
def calc_iou(b1, b2):
 # determine the (x, y)-coordinates of the intersection rectangle
 xA = max(b1[0], b2[0])
 yA = max(b1[1], b2[1])
 xB = min(b1[2], b2[2])
 yB = min(b1[3], b2[3])

 # compute the area of intersection rectangle
 area_intersect = (xB - xA + 1) * (yB - yA + 1)

 # Calculate area of boxes
 area_b1 = (b1[2] - b1[0] + 1) * (b1[3] - b1[1] + 1)
 area_b2 = (b2[2] - b2[0] + 1) * (b2[3] - b2[1] + 1)

 # compute the intersection over union by taking the intersection
 # area and dividing it by the sum of prediction + ground-truth
 # areas - the intersection area
 iou = area_intersect / float(area_b1 + area_b2 - area_intersect)

 # return the intersection over union value
 return iou

# IoU 實現 numpy 方式
import numpy as np

def calc_iou_np(xy_min1, xy_max1, xy_min2, xy_max2):
    # Get areas
    areas_1 = np.multiply.reduce(xy_max1 - xy_min1)
    areas_2 = np.multiply.reduce(xy_max2 - xy_min2)

    # determine the (x, y)-coordinates of the intersection rectangle
    _xy_min = np.maximum(xy_min1, xy_min2) 
    _xy_max = np.minimum(xy_max1, xy_max2)
    _wh = np.maximum(_xy_max - _xy_min, 0)

    # compute the area of intersection rectangle
    _areas = np.multiply.reduce(_wh)

    # return the intersection over union value
    return _areas / np.maximum(areas_1 + areas_2 - _areas, 1e-10)

參考

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 目標檢測評價指標目標檢測評價指標mAP 精准率和召回率目標檢測之評價指標 - mAP 目標檢測模型評價指標IoU、mAP 深度學習|目標檢測評價指標&NMS 目標檢測 — 評價指標目標檢測性能評價指標（mAP、IOU、NMS、FPS）分類和目標檢測的性能評價指標目標檢測的各類評價指標是什么及其計算目標檢測的評價指標（TP、TN、FP、FN、Precision、Recall、IoU、mIoU、AP、mAP）