yolo v3目標檢測網絡

　　yolo3的運行速度快，檢測效果也不差，算是使用最廣泛的目標檢測網絡了。對於yolo3的理解，也主要在於三點，一是網絡結構和模型流程的理解；二是對於正負樣本分配的理解(anchor和gt_box之間的匹配)；三是對於loss函數的理解

1.1 yolo v3 網絡結構

　　yolo3的網絡包括Darknet53， Yolo3_blocks(FPN)兩部分。yolo3的整體結構如下圖所示，輸入尺寸為416*416的圖片，經過Daeknet53提取特征，選擇最后三個卷積層的特征feature1， feature2， feature3，feature3送入Yolo3_block1產生預測值，feature2和Yolo3_block1中間的輸出特征圖結合后送入Yolo3_block2產生預測值，feature1和Yolo3_block2中間輸出特征圖結合后送入Yolo3_block3產生預測值，最后將三個Yolo3_blocks的預測值合並作為最后的輸出結果。

　　yolo3網絡中有三個值得注意的地方，一是大量使用的DBL結構，即conv+BatchNorm+Leaky Relu；二是借鑒FPN的思想，將不同尺度特征融合時采用了一個1*1的卷積和上采樣，其中卷積改變特征圖通道數，上采樣改變特征圖分辨率；三是Darknet53中的res unit，借鑒了resnet的殘差思想，其區別如下圖所示；

模型計算流程：

(以1x3x416x416圖片為示例，yolo3要求圖片尺寸為32的倍數)

Darknet53部分得到三個特征圖Feature1, Feature2和Feature3，其尺寸分別為(1, 256, 52, 52),(1, 512, 26, 26),(1, 1024, 13, 13)
Feature3(1, 1024, 13, 13)送入yolo3 blocks中的block1，得到中間特征block1_feature(1,512, 13, 13)和預測結果output1(1, 13, 13, 255) 。這里的255指3x(4+1+80)，即每個單元預測3個anchor，每個anchor預測值包括坐標值(4)，背景置信度(1)，類別置信度(80, coco數據集包括80個類別)
上一步block1_feature(1,512, 13, 13)上采樣后和Feature2(1, 512,26,26)結合后送入yolo3 block2中block2，得到中間特征block2_feature(1, 256, 26, 26)和預測結果output2(1, 26, 26, 255）
上一步block2_feature(1, 256, 26, 26)上采樣后和Feature1(1, 256, 52, 52)結合后送入yolo3 blocks的block3，得到預測結果output3(1, 52, 52, 255)
將output1，output2，output3結合得到最后的預測結果

訓練階段：

返回背景置信度all_objectness(1, 10647, 1), box中心位置偏移值all_box_centers(1, 10647, 2), box尺度all_box_scales(1, 10647, 2), 類別置信度all_class_pred(1, 10647, 80)，再結合gt_box, gt_ids, anchor計算loss

測試階段：

根據先驗框anchors和預測box偏移值，尺度信息，計算真實box坐標值和類別置信度，經過NMS，最后輸出類別信息ids(n, 1), 置信度scores(n, 1), 坐標bboxes(n, 4)

1.2 正負樣本分配(anchor和gt_boxes匹配策略)

anchor設置

　　若輸入圖片尺寸416*416，下圖是coco數據集的典型anchor配置(size)，可以看到不同尺寸的feature上每個像素點都設置了3個anchor，三張feature總共設置了10647個anchor：

　　對於52*52的feature map, 設置的三個anchor的寬高為(10, 13), (16, 30), (33, 23), 尺寸比較小，主要用來預測小目標。在feature map中每個像素點都對應三個anchor，如(0, 0)區域框中，其中心點為(0.5, 0.5)，而stride步長為8，對應416x416的原圖片中中心點為(8x0.5, 8x0.5)，因此此像素點處的三個anchor的中心點在(4, 4), 寬高依次為(10, 13), (16, 30), (33, 23)

anchor匹配規則

　　雖然設置了大量anchor，但一張訓練圖片上可能只有幾個gt_box，因此需要確定選擇那些anchor來負責預測這幾個gt_box

yolo3中，anchor和gt_box進行匹配，正負樣本是按照以下規則決定的：

第一步，如果一個anchor與所有的gt_box的最大 IoU<ignore_thresh 時，那這個anchor就是負樣本。（一般ignore_thresh=0.7）
第二步，如果gt_box的中心點落在一個區域中，該區域就負責檢測該物體。將與該物體有最大IoU的anchor作為正樣本（注意這里沒有用到ignore_thresh, 即使該最大IoU<ignore_thresh也不會影響該anchor為正樣本, 對於其他anchor，若IoU>ignore_thresh, 但不是最佳匹配，設置為忽略樣本

根據上面匹配規則，yolo3中anchor有三種樣本：正樣本，負樣本，忽略樣本

正樣本：和gt_box有最大的IoU，無論是否滿足IoU>ignore_thresh，用1標記
負樣本：不是和gt_box有最大IoU的anchor，且IoU <= ignore_thresh，用0標記
忽略樣本：不是和gt_box有最大IoU的anchor，且IoU > ignore_thresh，用-1標記

anchor匹配過程

　　一般先判斷用那個尺度的feature來預測gt_box, 再判斷用feature中那個區域來預測gt_box。下面是一段示例代碼幫助理解：

#coding:utf-8
import numpy as np

def box_to_center(box):
    new_box = np.zeros_like(box)
    center = box[:, :2]   #(x_center, y_center)
    size = box[:, 2:]/2   #[w, h]
    new_box[:, :2] = center - size
    new_box[:, 2:] = center + size
    return new_box

def box_iou(ar1, ar2):
    x1 = max(ar1[0], ar2[0])
    y1 = max(ar1[1], ar2[1])
    x2 = min(ar1[2], ar2[2])
    y2 = min(ar1[3], ar2[3])
    if x1 >= x2 or y1 >= y2:
        return 0
    else:
        area1 = (ar1[2] - ar1[0]) * (ar1[3] - ar1[1])
        area2 = (ar2[2] - ar2[0]) * (ar2[3] - ar2[1])
        area_iou = (x2 - x1) * (y2 - y1)
        iou = area_iou / (area1 + area2 - area_iou)
        return iou


def match(gt_box, anchors, features):
    num_anchors = np.cumsum([len(a) // 2 for a in anchors])  #[3 6 9]

    #1.移動gt_box
    gt_w= gt_box[2]-gt_box[0]
    gt_h = gt_box[3]-gt_box[1]
    shift_gt_box = np.array([-0.5*gt_w, -0.5*gt_h, 0.5*gt_w, 0.5*gt_h]) #將gt_box的中心移動到(0, 0)
    # print(shift_gt_box)

    #2.移動anchor_box
    anchors = np.array(anchors).reshape(-1, 2)
    anchors_bbox = np.concatenate([0*anchors, anchors], axis=1)
    shift_anchor_boxes = box_to_center(anchors_bbox)  #將anchor_box的中心移動(0, 0)
    # print(shift_anchor_boxes)

    #3.計算ious,選擇最佳匹配的anchor,確定采用那個尺度的feature
    num_anchor = anchors_bbox.shape[0]
    ious = np.zeros(shape=(1, num_anchor))
    for i in range(num_anchor):
        ious[:,i] = box_iou(shift_gt_box, shift_anchor_boxes[i])
    print(ious)   #[[0.00179019, 0.00716076, 0.01050245, 0.02685285, 0.04069698, 0.10210049,0.15574651, 0.46079484, 0.51360041]]
    match_index = ious.argmax(axis=1) # 8, 表示第9個anchor為最佳匹配
    feture_index = np.nonzero(num_anchors>match_index)[0][0] # 2，表示第9個anchor屬於第三張feature
    print(feture_index)

    #4.確定feature上的那個區域負責預測
    gt_center_x = (gt_box[0]+gt_box[2])/2
    gt_center_y = (gt_box[1]+gt_box[3])/2
    height, width = features[feture_index]
    loc_x = int(gt_center_x*width/416)
    loc_y = int(gt_center_y*height/416)
    print(loc_x, loc_y)   # 6, 6;  13*13的feature中(6, 6)位置處負責預測

    #結合3,4兩部，最終可以確定，采用13*13的feature中(6, 6)位置處，尺寸為(373, 326)的anchor預測這個gt_box

if __name__ == "__main__":
    # yolo3, 輸入圖片為416*416
    gt_box = [108, 42, 304, 384]  # 標注框[xmin, ymin, xmax, ymax]
    anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119],
               [116, 90, 156, 198, 373, 326]]  # 三個尺度feature中，anchor的寬高，w，h
    features = [(52, 52), (26, 26), (13, 13)]  # feature1, feature2, feature3
    match(gt_box, anchors, features)

anchor匹配

代碼簡要邏輯如下：

gt_box與9個anchor計算IoU，feature3中anchor的IoU最大，因此選擇feature3來預測gt_box
將gt_box的中心點映射到feature3，為feature3中的(6, 6)，因此feature3中的(6, 6)位置來預測gt_box
(6,6)位置處有三個anchor，將IOU最大的anchor設置為正樣本，其他兩個設置為負樣本(IOU<=0.7)或者忽略樣本(IOU>0.7)

下面這張圖總結了上述匹配過程：

1.3 yolo3 loss函數

　　yolo3 loss函數包括四部分: 背景置信度obj_loss， bbox中心點偏移值center_loss， bbox尺度scale_loss，類別置信度cls_loss。計算公式大致如下：

　　上面公式是大致理解，在具體計算和代碼實現細節上，不同版本的代碼里面會有些細微變化，下面為示例代碼：

from mxnet.gluon.loss import Loss, SigmoidBinaryCrossEntropyLoss, L1Loss


class YOLOV3Loss(Loss):
    """Losses of YOLO v3.

    Parameters
    ----------
    batch_axis : int, default 0
        The axis that represents mini-batch.
    weight : float or None
        Global scalar weight for loss.

    """
    def __init__(self, batch_axis=0, weight=None, **kwargs):
        super(YOLOV3Loss, self).__init__(weight, batch_axis, **kwargs)
        self._sigmoid_ce = SigmoidBinaryCrossEntropyLoss(from_sigmoid=False)
        self._l1_loss = L1Loss()

    def hybrid_forward(self, F, objness, box_centers, box_scales, cls_preds,
                       objness_t, center_t, scale_t, weight_t, class_t, class_mask):
        """Compute YOLOv3 losses.

        Parameters
        ----------
        objness : mxnet.nd.NDArray
            Predicted objectness (B, N), range (0, 1).
        box_centers : mxnet.nd.NDArray
            Predicted box centers (x, y) (B, N, 2), range (0, 1).
        box_scales : mxnet.nd.NDArray
            Predicted box scales (width, height) (B, N, 2).
        cls_preds : mxnet.nd.NDArray
            Predicted class predictions (B, N, num_class), range (0, 1).
        objness_t : mxnet.nd.NDArray
            Objectness target, (B, N), 0 for negative 1 for positive, -1 for ignore.
        center_t : mxnet.nd.NDArray
            Center (x, y) targets (B, N, 2).
        scale_t : mxnet.nd.NDArray
            Scale (width, height) targets (B, N, 2).
        weight_t : mxnet.nd.NDArray
            Loss Multipliers for center and scale targets (B, N, 2).
        class_t : mxnet.nd.NDArray
            Class targets (B, N, num_class).
            It's relaxed one-hot vector, i.e., (1, 0, 1, 0, 0).
            It can contain more than one positive class.
        class_mask : mxnet.nd.NDArray
            0 or 1 mask array to mask out ignored samples (B, N, num_class).

        Returns
        -------
        tuple of NDArrays
            obj_loss: sum of objectness logistic loss
            center_loss: sum of box center logistic regression loss
            scale_loss: sum of box scale l1 loss
            cls_loss: sum of per class logistic loss

        """
        # compute some normalization count, except batch-size
        denorm = F.cast(
            F.shape_array(objness_t).slice_axis(axis=0, begin=1, end=None).prod(), 'float32')
        weight_t = F.broadcast_mul(weight_t, objness_t)
        hard_objness_t = F.where(objness_t > 0, F.ones_like(objness_t), objness_t)
        new_objness_mask = F.where(objness_t > 0, objness_t, objness_t >= 0)
        obj_loss = F.broadcast_mul(
            self._sigmoid_ce(objness, hard_objness_t, new_objness_mask), denorm)
        center_loss = F.broadcast_mul(self._sigmoid_ce(box_centers, center_t, weight_t), denorm * 2)
        scale_loss = F.broadcast_mul(self._l1_loss(box_scales, scale_t, weight_t), denorm * 2)
        denorm_class = F.cast(
            F.shape_array(class_t).slice_axis(axis=0, begin=1, end=None).prod(), 'float32')
        class_mask = F.broadcast_mul(class_mask, objness_t)
        cls_loss = F.broadcast_mul(self._sigmoid_ce(cls_preds, class_t, class_mask), denorm_class)
        return obj_loss, center_loss, scale_loss, cls_loss

yolo3 loss函數

　　上面代碼中obj_loss, cls_loss, center_loss 都采用了BCE Loss, 只有scale_loss采用smoothL1 Loss。需要注意的是兩個地方，一是 center_loss 采用了BCE Loss，而不是上述公式中的MSE Loss；二是計算center_loss 和scale_loss時有一個權重系數 (2.0 - (gtw * gth) / (416*416)), 是為了抑制gt_box尺度大小(gtw, gth為gt_box的寬高)對loss的影響，當物體尺度大時，權重系數小，而物體尺寸小時，權重系數大。

參考：https://zhuanlan.zhihu.com/p/142662181?from_voters_page=true

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 目標檢測之YOLO算法詳解【轉】目標檢測之YOLO系列詳解目標檢測---YOLO [目標檢測]YOLO原理目標檢測網絡SSD詳解(三) 目標檢測網絡CenterNet詳解(四) 第三十五節，目標檢測之YOLO算法詳解 Yolo：實時目標檢測實戰（上） Python實現YOLO目標檢測 YOLO系列（單階段目標檢測）