Pytorch從0開始實現YOLO V3指南 part4——置信度閾值和非極大值抑制

本文轉載自查看原文 2019-05-21 16:49 1128

本節翻譯自：https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-4/

前一節我們實現了網絡的前向傳播。這一節我們對檢測輸出設置目標置信度閾值和進行非極大值抑制。

必要條件：

1.此系列教程的Part1到Part3。

2.Pytorch的基本知識，包括如何使用nn.Module，nn.Sequential，torch.nn.parameter類構建常規的結構

3.numpy的基礎知識

此前我們已經建立了一個模型，給定一張輸入圖片它能產生B*10674*85維的輸出向量。B是批中圖片的數目，10674是每張圖片預測的邊界框數目，85是邊界框屬性數目。

但就像我們在part1中描述的那樣，我們必須對輸出進行目標置信度閾值化和非極大值抑制，以獲得最終剩余的真正檢測。為此，我們將在文件util.py中創建一個名為write_results的函數。

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):

這個函數將prediction、confidence(目標置信度閾值)、num_classes(在我們的示例中是80)和nms_conf (NMS IoU的閾值)作為輸入。

目標置信度閾值：

我們的預測張量包含了關於B x 10647個邊界框的信息。對於每個目標置信度低於閾值的邊界框，我們將它的每個屬性(行向量)的值設置為零。

conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction*conf_mask

執行極大值抑制：

我們現在擁有的是邊界框的中心坐標以及高度和寬度，然而使用邊界框的對角點更容易計算IOU。因此，我們將框的(center x, center y, height, width)屬性轉換為(左上角x，左上角y，右下角x，右下角y)。

box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) 
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
prediction[:,:,:4] = box_corner[:,:,:4]

每幅圖像中真實檢測框的數目可能不同。例如，一批大小為3的圖像，其中圖像1、2和3分別有5、2、4個真檢測值。因此，每次必須對同一個圖像進行置信閾值和NMS，而不能對所涉及的操作進行矢量化，必須在預測的第一個維度(包含成批圖像的索引)上進行遍歷操作。

batch_size = prediction.size(0)

write = False

for ind in range(batch_size):
      image_pred = prediction[ind]          #image Tensor
      #confidence threshholding 
      #NMS

write標志位用於指示我們是否對output進行了初始化，將會使用一個向量來收集整個批中真實的預測。

循環的開始我們進行數據清理。因為每個邊界框行有85個屬性，其中80個是類得分。我們只關心類得分最大值的那個，所以會從每行中刪除80個類得分，添加具有最大值的類的索引，以及該類的類得分。

max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
max_conf = max_conf.float().unsqueeze(1)
max_conf_score = max_conf_score.float().unsqueeze(1)
seq = (image_pred[:,:5], max_conf, max_conf_score)
image_pred = torch.cat(seq, 1)

我們前面已經將目標置信度得分低於閾值的邊界框行屬性設置為了0，現在就篩除它們。

non_zero_ind =  (torch.nonzero(image_pred[:,4]))
try:
      image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7) 
except:
      continue
        
#For PyTorch 0.4 compatibility
#Since the above code with not raise exception for no detection 
#as scalars are supported in PyTorch 0.4
if image_pred_.shape[0] == 0:
      continue

try-except塊用於處理沒有檢測到的情況。在這種情況下，我們使用continue跳過此圖像的其余循環體。

接下來，讓我們在圖像中檢測目標。

#Get the various classes detected in the image
img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index

因為對於同一個類別可能存在多個正確檢測，我們使用一個叫unique的函數來獲得給定圖片中所有出現的類。

def unique(tensor):
    tensor_np = tensor.cpu().numpy()
    unique_np = np.unique(tensor_np)
    unique_tensor = torch.from_numpy(unique_np)
    
    tensor_res = tensor.new(unique_tensor.shape)
    tensor_res.copy_(unique_tensor)
    return tensor_res

之后我們對於每個類進行NMS

 for cls in img_classes:
        #perform NMS

一進入這個循環，首先要做的事情就是提取對於某一特定類別的檢測（用變量cls表示）

#get the detections with one particular class
cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind].view(-1,7)

#sort the detections such that the entry with the maximum objectness
#confidence is at the top
conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
image_pred_class = image_pred_class[conf_sort_index]
idx = image_pred_class.size(0)   #Number of detections

然后我們進行NMS

for i in range(idx):
    #Get the IOUs of all boxes that come after the one we are looking at 
    #in the loop
    try:
        ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
    except ValueError:
        break

    except IndexError:
        break

    #Zero out all the detections that have IoU > treshhold
    iou_mask = (ious < nms_conf).float().unsqueeze(1)
    image_pred_class[i+1:] *= iou_mask       

    #Remove the non-zero entries
    non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
    image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

這里我們使用到了一個函數bbox_iou。第一個輸入參數是循環體變量i索引處的邊界框，第二個輸入參數是多行邊界框的一個tensor。函數bbox_iou的輸出是一個tensor它包含了第一個輸入的邊界框與第二個輸入的所有邊界框的IOU。如下：

之前我們已經將目標置信度高的邊界框放在前面，如果后面的邊界框IoU值與前面的相比超過了閾值，那后者就會被刪去。

循環體里面下面這行計算IoU。

ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])

每輪迭代，如果有任何索引大於i的邊界框與第i個邊界框的IoU大於閾值nms_thresh，那這個邊界框就會被刪除。

#Zero out all the detections that have IoU > treshhold
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask       

#Remove the non-zero entries
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind]

還要注意的是，我們將計算IoU的代碼行放在try-catch塊中。這是因為此循環按照id進行迭代(image_pred_class中的行數)。但因為我們循環過程中可能會從image_pred_class中刪除一些邊界框。這樣一來，迭代可能會出現索引越界觸發IndexError或者image_pred_class[i+1:]返回一個空張量觸發ValueError。此時我們可以確定NMS已經無法刪除多余的邊界框了，從而跳出循環。

計算IoU:

def bbox_iou(box1, box2):
    """
    Returns the IoU of two bounding boxes 
    """
    #Get the coordinates of bounding boxes
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
    
    #get the corrdinates of the intersection rectangle
    inter_rect_x1 =  torch.max(b1_x1, b2_x1)
    inter_rect_y1 =  torch.max(b1_y1, b2_y1)
    inter_rect_x2 =  torch.min(b1_x2, b2_x2)
    inter_rect_y2 =  torch.min(b1_y2, b2_y2)
    
    #Intersection area
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)
 
    #Union Area
    b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
    
    iou = inter_area / (b1_area + b2_area - inter_area)
    
    return iou

寫入預測：

write_results函數輸出一個形狀為 Dx8 的tensor。這里D是所有圖像的真實檢測，每個都用一行表示。每個檢測有8個屬性，即檢測所屬批次圖像的索引、4個角坐標、目標置信度得分、最大置信類得分、該類的索引。

和此前一樣，我們等到有一個檢測時才初始化輸出向量並將后續的檢測拼接進來。使用寫標志來表示tensor是否已經初始化。在遍歷類的循環結束時，我們將檢測結果添加到輸出tensor中。

 batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)      
 #Repeat the batch_id for as many detections of the class cls in the image
 seq = batch_ind, image_pred_class

 if not write:
     output = torch.cat(seq,1)
     write = True
 else:
     out = torch.cat(seq,1)
     output = torch.cat((output,out))

在函數的末尾，我們檢查輸出是否已經初始化。如果沒有，就意味着這批圖像中沒有一個檢測到。在這種情況下，我們返回0。

 try:
    return output
 except:
    return 0

這就是這一部分所要講解的內容了。現在我們終於有了一個預測，它以tensor的形式列出了每一個邊界框。所以只剩下一件事就是創建一個輸入管道來從磁盤讀取圖像，計算預測，在圖像上繪制邊界框，然后顯示/寫入這些圖像。這是我們下一部分要做的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 從零開始Pytorch-YOLOv3【筆記】（四）置信度閾值和非極大值抑制 Pytorch從0開始實現YOLO V3指南 part1——理解YOLO的工作 Pytorch從0開始實現YOLO V3指南 part3——實現網絡前向傳播 Pytorch從0開始實現YOLO V3指南 part5——設計輸入和輸出的流程 Pytorch從0開始實現YOLO V3指南 part2——搭建網絡結構層 pytorch實現yolov3(4) 非極大值抑制nms 非極大值抑制（NMS）的幾種實現非極大值抑制算法非極大值抑制（NMS） IoU與非極大值抑制（NMS）的理解與實現

Pytorch從0開始實現YOLO V3指南 part4——置信度閾值和非極大值抑制

Further Reading

免責聲明！