YOLO v3 網絡結構和源碼詳解

本文轉載自查看原文 2021-03-11 16:50 1172 Pytorch/ 計算機視覺-CV/ 深度學習/ Python

0.摘要

最近一段時間在學習yolo3，看了很多博客，理解了一些理論知識，但是學起來還是有些吃力，之后看了源碼，才有了更進一步的理解。在這里，我不在贅述網絡方面的代碼，網絡方面的代碼比較容易理解，下面將給出整個yolo3代碼的詳解解析，整個源碼中函數的作用以及調用關系見下圖：

參考：https://blog.csdn.net/weixin_41943311/article/details/95672137?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

1.model.py

1.1 yolo_head()

yolo_head()函數的輸入是Darknet53的最后輸出的三個特征圖feats，anchors，num_class,input_shpe，此函數的功能是將特征圖的進行解碼，這一步極為重要，如其中一個特征圖的shape是（13,13,255），其實質就是對應着（13,13,3,85）,分別對應着13*13個網格，每個網格3個anchors，85=（x，y，w，h，confident）,此時box的xy是相對於網格的偏移量，所以還需要經過一些列的處理，處理方式見下圖：

def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
    """Convert final layer features to bounding box parameters."""
    num_anchors = len(anchors)#num_anchors=3
    # Reshape to batch, height, width, num_anchors, box_params.
    anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])
    #anchors=anchors[anchors_mask[1]]=anchors[[6,7,8]]= [116,90],  [156,198],  [373,326]
    """#通過arange、reshape、tile的組合，根據grid_shape(13x13、26x26或52x52）創建y軸的0~N-1的組合grid_y，再創建x軸的0~N-1的組合grid_x，將兩者拼接concatenate，形成NxN的grid(13x13、26x26或52x52）"""
    grid_shape = K.shape(feats)[1:3] # height, width,#13x13或26x26或52x52
    grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
        [1, grid_shape[1], 1, 1])
    grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
        [grid_shape[0], 1, 1, 1])
    grid = K.concatenate([grid_x, grid_y])
    grid = K.cast(grid, K.dtype(feats))
    #cast函數用法：cast(x, dtype, name=None)，x:待轉換的張量，type：需要轉換成什么類型
    """grid形式：（0,0），（0,1），（0,2）......(1,0),(1,1).....(12,12)"""
    feats = K.reshape(
        feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])
    """(batch_size,13,13,3,85)"""
    "此時的xy為中心坐標，相對於左上角的中心坐標"

    # Adjust preditions to each spatial grid point and anchor size.
    """將預測值調整為真實值"""
    "將中心點相對於網格的坐標轉換成在整張圖片中的坐標，相對於13/26/52的相對坐標"
    "將wh轉換成預測框的wh，並處以416歸一化"
    box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))#實際上就是除以13或26或52
    #box_xy = (K.sigmoid(feats[:,:,:,:2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
    # ...操作符，在Python中，“...”(ellipsis)操作符，表示其他維度不變，只操作最前或最后1維；
    box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
    box_confidence = K.sigmoid(feats[..., 4:5])
    box_class_probs = K.sigmoid(feats[..., 5:])
    #切片省略號的用法，省略前面左右的冒號，參考博客：https://blog.csdn.net/z13653662052/article/details/78010654?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

    if calc_loss == True:
        return grid, feats, box_xy, box_wh
    return box_xy, box_wh, box_confidence, box_class_probs
    #預測框相對於整張圖片中心點的坐標與預測框的wh

1.2 yolo_correct_box()

此函數的功能是將yolo_head()輸出，也即是box相對於整張圖片的中心坐標轉換成box的左上角右下角的坐標

 1 def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
 2     '''Get corrected boxes'''
 3     '''對上面函數輸出的預測的坐標進行修正
 4     比如
 5     image_shape
 6     為[600，800]，input_shape
 7     為[300, 500]，那么
 8     new_shape
 9     為[300, 400]
10 
11     offset
12     為[0, 0.125]
13     scales
14     為[0.5, 0.625]'''
15 
16 
17     # 將box_xy, box_wh轉換為輸入圖片上的真實坐標，輸出boxes是框的左下、右上兩個坐標(y_min, x_min, y_max, x_max)
18     # ...操作符，在Python中，“...”(ellipsis)操作符，表示其他維度不變，只操作最前或最后1維；
19     # np.array[i:j:s]，當s<0時，i缺省時，默認為-1；j缺省時，默認為-len(a)-1；所以array[::-1]相當於array[-1:-len(a)-1:-1]，也就是從最后一個元素到第一個元素復制一遍，即倒序
20     box_yx = box_xy[..., ::-1]#將xy坐標進行交換，反序（y，x）
21     box_hw = box_wh[..., ::-1]
22     input_shape = K.cast(input_shape, K.dtype(box_yx))
23     image_shape = K.cast(image_shape, K.dtype(box_yx))
24     new_shape = K.round(image_shape * K.min(input_shape/image_shape))
25     #.round用於取近似值，保留幾位小數，第一個參數是一個浮點數，第二個參數是保留的小數位數，可選，如果不寫的話默認保留到整數
26     offset = (input_shape-new_shape)/2./input_shape
27     scale = input_shape/new_shape
28     box_yx = (box_yx - offset) * scale
29     box_hw *= scale
30     """獲得預測框的左上角與右下角的坐標"""
31     box_mins = box_yx - (box_hw / 2.)
32     box_maxes = box_yx + (box_hw / 2.)
33     boxes =  K.concatenate([
34         box_mins[..., 0:1],  # y_min
35         box_mins[..., 1:2],  # x_min
36         box_maxes[..., 0:1],  # y_max
37         box_maxes[..., 1:2]  # x_max
38     ])#...操作符，在Python中，“...”(ellipsis)操作符，表示其他維度不變，只操作最前或最后1維；
39 
40     # Scale boxes back to original image shape.
41     boxes *= K.concatenate([image_shape, image_shape])
42     return boxes#得到預測框的左下角坐標與右上角坐標

1.3 yolo_box_and_score

獲得box與得分

1 def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
2     '''Process Conv layer output'''
3     box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,
4         anchors, num_classes, input_shape)
5     boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
6     boxes = K.reshape(boxes, [-1, 4])#reshape,將不同網格的值轉換為框的列表。即（?,13,13,3,4）->(?,4)  ？：框的數目
7     box_scores = box_confidence * box_class_probs
8     box_scores = K.reshape(box_scores, [-1, num_classes])#reshape,將框的得分展平，變為(?,80); ?:框的數目
9     return boxes, box_scores#返回預測框的左下角與右上角的坐標與得分

1.4 yolo_eval()

此函數的作用是刪除冗余框，保留最優框，用到非極大值抑制算法

 1 def yolo_eval(yolo_outputs,
 2               anchors,
 3               num_classes,
 4               image_shape,
 5               max_boxes=20,
 6               score_threshold=.6,
 7               iou_threshold=.5):
 8     """Evaluate YOLO model on given input and return filtered boxes."""
 9     """      yolo_outputs        #模型輸出，格式如下【（?，13,13,255）（?，26,26,255）（?,52,52,255）】 ?:bitch size; 13-26-52:多尺度預測； 255：預測值（3*（80+5））
10               anchors,            #[(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198),(373,326)]
11               num_classes,　　　　 # 類別個數，coco集80類
12               image_shape,        #placeholder類型的TF參數，默認(416, 416)；
13               max_boxes=20,       #每張圖每類最多檢測到20個框同類別框的IoU閾值，大於閾值的重疊框被刪除，重疊物體較多，則調高閾值，重疊物體較少，則調低閾值
14               score_threshold=.6, #框置信度閾值，小於閾值的框被刪除，需要的框較多，則調低閾值，需要的框較少，則調高閾值；
15               iou_threshold=.5):  #同類別框的IoU閾值，大於閾值的重疊框被刪除，重疊物體較多，則調高閾值，重疊物體較少，則調低閾值"""
16     num_layers = len(yolo_outputs)# #yolo的輸出層數；num_layers = 3  -> 13-26-52
17     anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting
18     # 每層分配3個anchor box.如13*13分配到[6,7,8]即[（116,90）（156,198）（373,326）]
19     input_shape = K.shape(yolo_outputs[0])[1:3] * 32
20     # 輸入shape(?,13,13,255);即第一維和第二維分別*32  ->13*32=416; input_shape:(416,416)
21     #yolo_outputs=[(batch_size，13,13,255)，(batch_size，26,26,255)，(batch_size，52,52,255)]
22     #input_shape=416*416
23     boxes = []
24     box_scores = []
25     for l in range(num_layers):
26         _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
27             anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
28         boxes.append(_boxes)
29         box_scores.append(_box_scores)
30     boxes = K.concatenate(boxes, axis=0)
31     box_scores = K.concatenate(box_scores, axis=0) #K.concatenate:將數據展平 ->(?,4)
32 
33     #可能會產生很多個預選框，需要經過（1）閾值的刪選，（2）非極大值抑制的刪選
34     mask = box_scores >= score_threshold#得分大於置信度為True,否則為Flase
35     max_boxes_tensor = K.constant(max_boxes, dtype='int32')
36     boxes_ = []
37     scores_ = []
38     classes_ = []
39     """
40     # ---------------------------------------#
41     #   1、取出每一類得分大於score_threshold
42     #   的框和得分
43     #   2、對得分進行非極大抑制
44     # ---------------------------------------#
45     # 對每一個類進行判斷"""
46     for c in range(num_classes):
47         # TODO: use keras backend instead of tf.
48         class_boxes = tf.boolean_mask(boxes, mask[:, c])#將輸入的數組挑出想要的數據輸出，將得分大於閾值的坐標挑選出來
49         #將第c類中得分大於閾值的坐標挑選出來
50         class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
51         # 將第c類中得分大於閾值的框挑選出來
52         """非極大值抑制部分"""
53         # 非極大抑制，去掉box重合程度高的那一些
54         """原理：(1)從最大概率矩形框F開始，分別判斷A~E與F的重疊度IOU是否大於某個設定的閾值;
55 
56                 (2)假設B、D與F的重疊度超過閾值，那么就扔掉B、D；並標記第一個矩形框F，是我們保留下來的。
57 
58                 (3)從剩下的矩形框A、C、E中，選擇概率最大的E，然后判斷E與A、C的重疊度，重疊度大於一定的閾值，那么就扔掉；並標記E是我們保留下來的第二個矩形框。
59 
60                 就這樣一直重復，找到所有被保留下來的矩形框。"""
61         nms_index = tf.image.non_max_suppression(
62             class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
63         class_boxes = K.gather(class_boxes, nms_index)
64         class_box_scores = K.gather(class_box_scores, nms_index)
65         classes = K.ones_like(class_box_scores, 'int32') * c#將class_box_scores中的數變成1
66         boxes_.append(class_boxes)
67         scores_.append(class_box_scores)
68         classes_.append(classes)
69     boxes_ = K.concatenate(boxes_, axis=0)
70     scores_ = K.concatenate(scores_, axis=0)
71     classes_ = K.concatenate(classes_, axis=0)
72     #return 經過非極大值抑制保留下來的一個框
73 
74     return boxes_, scores_, classes_

1.5 preprocess_true_box()

  1 def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
  2     '''
  3     在preprocess_true_boxes中，輸入：
  4 
  5     true_boxes：檢測框，批次數16，最大框數20，每個框5個值，4個邊界點和1個類別序號，如(16, 20, 5)；
  6     input_shape：圖片尺寸，如(416, 416)；
  7     anchors：anchor box列表；
  8     num_classes：類別的數量；
  9     Preprocess true boxes to training input format
 10 
 11     Parameters
 12     ----------
 13     true_boxes: array, shape=(m, T, 5)
 14         Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
 15     input_shape: array-like, hw, multiples of 32
 16     anchors: array, shape=(N, 2), wh
 17     num_classes: integer
 18 
 19     Returns
 20     -------
 21     y_true: list of array, shape like yolo_outputs, xywh are reletive value
 22 
 23     '''
 24     # 檢查有無異常數據 即txt提供的box id 是否存在大於 num_class的情況
 25     # true_boxes.shape  = (圖片張數，每張圖片box個數，5)（5是左上右下點坐標加上類別下標）
 26     assert (true_boxes[..., 4]<num_classes).all(), 'class id must be less than num_classes'
 27     num_layers = len(anchors)//3 # default setting
 28     anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
 29 
 30     true_boxes = np.array(true_boxes, dtype='float32')
 31     input_shape = np.array(input_shape, dtype='int32')    # [416 416] shape(2,)
 32     # 將每個box的左上點和右下點坐標相加除2，即取中點！
 33     """計算true_boxes：
 34 
 35        true_boxes：真值框，左上和右下2個坐標值和1個類別，如[184, 299, 191, 310, 0.0]，結構是(16, 20, 5)，16是批次數，20是框的最大數，5是框的5個值；
 36        boxes_xy：xy是box的中心點，結構是(16, 20, 2)；
 37        boxes_wh：wh是box的寬和高，結構也是(16, 20, 2)；
 38        input_shape：輸入尺寸416x416；
 39        true_boxes：第0和1位設置為xy，除以416，歸一化，第2和3位設置為wh，除以416，歸一化，如[0.449, 0.730, 0.016, 0.026, 0.0]。"""
 40     boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
 41     # 得到box寬高
 42     boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
 43     # 中心坐標 和 寬高 都變成 相對於input_shape的比例
 44     true_boxes[..., 0:2] = boxes_xy/input_shape[::-1]
 45     true_boxes[..., 2:4] = boxes_wh/input_shape[::-1]
 46     # 這個m應該是batch的大小 即是輸入圖片的數量
 47     m = true_boxes.shape[0]
 48     # grid_shape [13,13 ]   [26,26]  [52,52]
 49     grid_shapes = [input_shape//{0:32, 1:16, 2:8}[l] for l in range(num_layers)]
 50     #y_true是全0矩陣（np.zeros）列表，即[(16,13,13,3,6), (16,26,26,3,6), (16,52,52,3,6)]
 51     y_true = [np.zeros((m,grid_shapes[l][0],grid_shapes[l][1],len(anchor_mask[l]),5+num_classes),
 52         dtype='float32') for l in range(num_layers)]
 53     # y_true  m*13*13*3*(5+num_clasess)
 54     #         m*26*26*3*(5+num_classes)
 55     #         m*52*52*3*(5+num_classes)
 56     # Expand dim to apply broadcasting.
 57 
 58     # Expand dim to apply broadcasting.
 59     #在原先axis出添加一個維度,由(9,2)轉為(1,9,2)
 60     anchors = np.expand_dims(anchors, 0)
 61     # 網格中心為原點（即網格中心坐標為 （0,0） ）,　計算出anchor 右下角坐標
 62     anchor_maxes = anchors / 2.
 63     #計算出左上標
 64     anchor_mins = -anchor_maxes
 65     # 去掉異常數據
 66     valid_mask = boxes_wh[..., 0]>0
 67 
 68     for b in range(m):
 69         # Discard zero rows.
 70         wh = boxes_wh[b, valid_mask[b]]
 71         if len(wh)==0: continue
 72         # Expand dim to apply broadcasting.
 73         wh = np.expand_dims(wh, -2)
 74         box_maxes = wh / 2.
 75         box_mins = -box_maxes
 76         # # 假設　bouding box 的中心也位於網格的中心
 77 
 78         """計算標注框box與anchor box的iou值，計算方式很巧妙：
 79 
 80         box_mins的shape是(1,1,2)，anchor_mins的shape是(1,9,2)，intersect_mins的shape是(1,9,2)，即兩兩組合的值；
 81         intersect_area的shape是(1,9)；
 82         box_area的shape是(1,1)；
 83         anchor_area的shape是(1,9)；
 84         iou的shape是(1,9)；
 85         IoU數據，即anchor box與檢測框box，兩兩匹配的iou值"""
 86         intersect_mins = np.maximum(box_mins, anchor_mins)#逐位比較
 87         intersect_maxes = np.minimum(box_maxes, anchor_maxes)
 88         intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
 89         intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]#寬*高
 90         box_area = wh[..., 0] * wh[..., 1]
 91         anchor_area = anchors[..., 0] * anchors[..., 1]
 92         iou = intersect_area / (box_area + anchor_area - intersect_area)
 93 
 94         # Find best anchor for each true box
 95         best_anchor = np.argmax(iou, axis=-1)
 96 
 97         """設置y_true的值：
 98 
 99            t是box的序號；n是最優anchor的序號；l是層號；
100            如果最優anchor在層l中，則設置其中的值，否則默認為0；
101            true_boxes是(16, 20, 5)，即批次、box數、框值；
102            true_boxes[b, t, 0]，其中b是批次序號、t是box序號，第0位是x，第1位是y；
103            grid_shapes是3個檢測圖的尺寸，將歸一化的值，與框長寬相乘，恢復為具體值；
104            k是在anchor box中的序號；
105            c是類別，true_boxes的第4位；
106            將xy和wh放入y_true中，將y_true的第4位框的置信度設為1，將y_true第5~n位的類別設為1；"""
107         for t, n in enumerate(best_anchor):
108             # 遍歷anchor 尺寸 3個尺寸
109             # 因為此時box 已經和一個anchor box匹配上，看這個anchor box屬於那一層，小，中，大，然后將其box分配到那一層
110             for l in range(num_layers):
111                 if n in anchor_mask[l]:
112                     #因為grid_shape格式是hw所以是x*grid_shapes[l][1]=x*w，求出對應所在網格的橫坐標，這里的x是相對於整張圖片的相對坐標，
113                     # 是在原先坐標上除以了w，所以現在要乘以w
114                     i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')
115                     #np.around 四舍五入
116                     #np.floor向下取整
117                     #np.ceil向上取整
118                     #np.where條件選取
119                     # np.floor 返回不大於輸入參數的最大整數。 即對於輸入值 x ，將返回最大的整數 i ，使得 i <= x。
120                     # true_boxes x,y,w,h, 此時x y w h都是相對於整張圖像的
121                     # 第b個圖像 第 t個 bounding box的 x 乘以 第l個grid shap的x（grid shape 格式是hw，
122                     # 因為input_shape格式是hw）
123                     # 找到這個bounding box落在哪個cell的中心
124                     #i，j是所在網格的位置
125                     j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')
126                     # 找到n 在 anchor_box的索引位置
127                     k = anchor_mask[l].index(n)
128                     # 得到box的id
129                     c = true_boxes[b,t, 4].astype('int32')
130                     # 第b個圖像 第j行 i列 第k個anchor x，y，w，h,confindence,類別概率
131                     y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
132                     y_true[l][b, j, i, k, 4] = 1
133                     # 置信度是1 因為含有目標
134                     y_true[l][b, j, i, k, 5+c] = 1
135                     # 類別的one-hot編碼
136 
137     return y_true

1.6 yolo_loss

此函數定義損失函數，損失函數包括三個部分，坐標損失，置信度損失，類別損失：

 1 def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):
 2     """true_boxes : 實際框的位置和類別，我們的輸入。三個維度：
 3     第一個維度：圖片張數
 4     第二個維度：一張圖片中有幾個實際框
 5     第三個維度： [x, y, w, h, class]，x,y 是實際框的中心點坐標，w,h 是框的寬度和高度。x,y,w,h 均是除以圖片分辨率得到的[0,1]范圍的值。
 6     anchors : 實際anchor boxes 的值，論文中使用了五個。[w,h]，都是相對於gird cell 長寬的比值。二個維度：
 7     第一個維度：anchor boxes的數量，這里是5
 8     第二個維度：[w,h]，w,h,都是相對於gird cell 長寬的比值。
 9     """
10     '''Return yolo_loss tensor
11 
12     Parameters
13     ----------
14     yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body
15     y_true: list of array, the output of preprocess_true_boxes
16     anchors: array, shape=(N, 2), wh
17     num_classes: integer
18     ignore_thresh: float, the iou threshold whether to ignore object confidence loss
19 
20     Returns
21     -------
22     loss: tensor, shape=(1,)
23 
24     '''
25     num_layers = len(anchors)//3 # default setting
26     yolo_outputs = args[:num_layers]
27     y_true = args[num_layers:]
28     anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
29     input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
30     grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
31     loss = 0
32     m = K.shape(yolo_outputs[0])[0] # batch size, tensor
33     mf = K.cast(m, K.dtype(yolo_outputs[0]))
34 
35     for l in range(num_layers):
36         object_mask = y_true[l][..., 4:5]#置信度
37         true_class_probs = y_true[l][..., 5:]#類別
38 
39         grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],
40              anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
41         pred_box = K.concatenate([pred_xy, pred_wh])
42 
43         # Darknet raw box to calculate loss.
44         # 這是對x,y,w,b轉換公式的反變換
45         raw_true_xy = y_true[l][..., :2]*grid_shapes[l][::-1] - grid
46         raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
47         # 這部操作是避免出現log(0) = 負無窮，故當object_mask置信率接近0是返回全0結果
48         # K.switch(條件函數，返回值1，返回值2)其中1,2要等shape
49         raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
50         #提升針對小物體的小技巧：針對 YOLOv3來說，regression損失會乘一個（2-w*h）的比例系數，
51         # w 和 h 分別是ground truth 的寬和高。如果不減去 w*h，AP 會有一個明顯下降。如果繼續往上加，如 (2-w*h)*1.5，總體的 AP 還會漲一個點左右（包括驗證集和測試集），大概是因為 COCO 中小物體實在太多的原因。
52 
53         box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4]
54 
55         # Find ignore mask, iterate over each of batch.
56         ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
57         object_mask_bool = K.cast(object_mask, 'bool')
58         ##將真實標定的數據置信率轉換為T or F的掩膜
59 
60         def loop_body(b, ignore_mask):
61             true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0])#挑選出置信度大於0的框的相應的坐標，truebox形式為中心坐標xy與hw
62 
63             iou = box_iou(pred_box[b], true_box)#計算iou，pre_box是通過yolo_head解碼之后的xywh
64             best_iou = K.max(iou, axis=-1)#選取最大iou的
65             ignore_mask = ignore_mask.write(b, K.cast(best_iou<ignore_thresh, K.dtype(true_box)))
66             return b+1, ignore_mask
67         _, ignore_mask = K.control_flow_ops.while_loop(lambda b,*args: b<m, loop_body, [0, ignore_mask])
68         ignore_mask = ignore_mask.stack()#將一個列表中維度數目為R的張量堆積起來形成維度為R+1的新張量
69         ignore_mask = K.expand_dims(ignore_mask, -1)
70 
71         # K.binary_crossentropy is helpful to avoid exp overflow.
72         xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[...,0:2], from_logits=True)
73         wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh-raw_pred[...,2:4])
74         confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ \
75             (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
76         class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[...,5:], from_logits=True)
77 
78         xy_loss = K.sum(xy_loss) / mf
79         wh_loss = K.sum(wh_loss) / mf
80         confidence_loss = K.sum(confidence_loss) / mf
81         class_loss = K.sum(class_loss) / mf
82         loss += xy_loss + wh_loss + confidence_loss + class_loss
83         if print_loss:
84             loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], message='loss: ')
85     return loss

2.train.py

整個訓練分為兩個階段，第一個階段為0~50epoch，訓練最后的loss層，前面的層被凍結，第二個階段為50~100個epoch訓練前面的層

  1 def _main():
  2     annotation_path = '2007_train.txt'
  3     log_dir = 'logs/000/'
  4     classes_path = 'model_data/voc_classes.txt'
  5     anchors_path = 'model_data/yolo_anchors.txt'
  6     class_names = get_classes(classes_path)
  7     num_classes = len(class_names)
  8     anchors = get_anchors(anchors_path)
  9 
 10     input_shape = (416,416) # multiple of 32, hw
 11 
 12     is_tiny_version = len(anchors)==6 # default setting
 13     if is_tiny_version:
 14         model = create_tiny_model(input_shape, anchors, num_classes,
 15             freeze_body=2, weights_path='model_data/tiny_yolo_weights.h5')
 16     else:
 17         model = create_model(input_shape, anchors, num_classes,
 18             freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze
 19 
 20     logging = TensorBoard(log_dir=log_dir)
 21     checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
 22         monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
 23     reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
 24     """monitor：被監測的量
 25        factor：每次減少學習率的因子，學習率將以lr = lr*factor的形式被減少
 26        patience：當patience個epoch過去而模型性能不提升時，學習率減少的動作會被觸發
 27        mode：‘auto’，‘min’，‘max’之一，在min模式下，如果檢測值觸發學習率減少。在max模式下，當檢測值不再上升則觸發學習率減少。
 28        epsilon：閾值，用來確定是否進入檢測值的“平原區”
 29        cooldown：學習率減少后，會經過cooldown個epoch才重新進行正常操作
 30        min_lr：學習率的下限"""
 31     early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
 32     """monitor: 被監測的數據。
 33        min_delta: 在被監測的數據中被認為是提升的最小變化， 例如，小於 min_delta 的絕對變化會被認為沒有提升。
 34        patience: 沒有進步的訓練輪數，在這之后訓練就會被停止。
 35        verbose: 詳細信息模式。
 36        mode: {auto, min, max} 其中之一。 在 min 模式中， 當被監測的數據停止下降，訓練就會停止；在 max 模式中，當被監測的數據停止上升，訓練就會停止；在 auto 模式中，方向會自動從被監測的數據的名字中判斷出來。
 37        baseline: 要監控的數量的基准值。 如果模型沒有顯示基准的改善，訓練將停止。
 38        restore_best_weights: 是否從具有監測數量的最佳值的時期恢復模型權重。 如果為 False，則使用在訓練的最后一步獲得的模型權重"""
 39 
 40     val_split = 0.1
 41     with open(annotation_path) as f:
 42         lines = f.readlines()
 43     np.random.seed(10101)
 44     np.random.shuffle(lines)
 45     np.random.seed(None)
 46     num_val = int(len(lines)*val_split)
 47     num_train = len(lines) - num_val
 48 
 49     # Train with frozen layers first, to get a stable loss.
 50     # Adjust num epochs to your dataset. This step is enough to obtain a not bad model.
 51     if True:
 52         model.compile(optimizer=Adam(lr=1e-3), loss={
 53             # use custom yolo_loss Lambda layer.
 54             # # 使用定制的 yolo_loss Lambda層
 55             'yolo_loss': lambda y_true, y_pred: y_pred})
 56         #解釋：模型compile時傳遞的是自定義的loss，而把loss寫成一個層融合到model里面后，
 57         # y_pred就是loss。自定義損失函數規定要以y_true, y_pred為參數
 58 
 59         batch_size = 32
 60         print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
 61         model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
 62                 steps_per_epoch=max(1, num_train//batch_size),
 63                 validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
 64                 validation_steps=max(1, num_val//batch_size),
 65                 epochs=50,
 66                 initial_epoch=0,
 67                 callbacks=[logging, checkpoint])
 68         model.save_weights(log_dir + 'trained_weights_stage_1.h5')
 69 
 70     # Unfreeze and continue training, to fine-tune.
 71     # Train longer if the result is not good.
 72     if True:
 73         for i in range(len(model.layers)):
 74             model.layers[i].trainable = True
 75         model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change
 76         print('Unfreeze all of the layers.')
 77 
 78         batch_size = 32 # note that more GPU memory is required after unfreezing the body
 79         print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
 80         model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
 81             steps_per_epoch=max(1, num_train//batch_size),
 82             validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
 83             validation_steps=max(1, num_val//batch_size),
 84             epochs=100,
 85             initial_epoch=50,
 86             callbacks=[logging, checkpoint, reduce_lr, early_stopping])
 87         model.save_weights(log_dir + 'trained_weights_final.h5')
 88 
 89     # Further training if needed.
 90 
 91 
 92 def get_classes(classes_path):
 93     '''loads the classes'''
 94     with open(classes_path) as f:
 95         class_names = f.readlines()
 96     class_names = [c.strip() for c in class_names]
 97     return class_names
 98 
 99 def get_anchors(anchors_path):
100     '''loads the anchors from a file'''
101     with open(anchors_path) as f:
102         anchors = f.readline()
103     anchors = [float(x) for x in anchors.split(',')]
104     return np.array(anchors).reshape(-1, 2)
105 
106 
107 def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
108             weights_path='model_data/yolo_weights.h5'):
109     '''create the training model'''
110     K.clear_session() # get a new session
111     image_input = Input(shape=(None, None, 3))
112     h, w = input_shape
113     num_anchors = len(anchors)
114 
115     y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \
116         num_anchors//3, num_classes+5)) for l in range(3)]
117 
118     model_body = yolo_body(image_input, num_anchors//3, num_classes)
119     print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
120 
121     if load_pretrained:
122         model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
123         print('Load weights {}.'.format(weights_path))
124         """根據預訓練權重的地址weights_path，加載權重文件，設置參數為，按名稱對應by_name，
125            略過不匹配skip_mismatch；
126 
127            選擇凍結模式：模式1是凍結185層，模式2是保留最底部3層，其余全部凍結。
128            整個模型共有252層；將所凍結的層，設置為不可訓練，trainable=False；"""
129         if freeze_body in [1, 2]:
130             # Freeze darknet53 body or freeze all but 3 output layers.
131             num = (185, len(model_body.layers)-3)[freeze_body-1]
132             for i in range(num): model_body.layers[i].trainable = False
133             print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
134     """Lambda是Keras的自定義層，輸入為model_body.output和y_true，輸出output_shape是(1,)，即一個損失值；
135 
136        自定義Lambda層的名字name為yolo_loss；
137 
138        層的參數是錨框列表anchors、類別數num_classes和IoU閾值ignore_thresh。
139        其中，ignore_thresh用於在物體置信度損失中過濾IoU較小的框；
140 
141        yolo_loss是損失函數的核心邏輯。"""
142     model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
143         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
144         [*model_body.output, *y_true])
145     """把loss寫成一個層，作為最后的輸出，搭建模型的時候，就只需要將模型的output定義為loss
146     ，而compile的時候，直接將loss設置為y_pred（因為模型的輸出就是loss，所以y_pred就是loss），
147     無視y_true，訓練的時候，y_true隨便扔一個符合形狀的數組進去就行了"""
148     #keras.layer.Lambda將任意表達式封裝為 Layer 對象
149     #keras.layers.Lambda(function, output_shape=None, mask=None, arguments=None)
150     #function: 需要封裝的函數。 將輸入張量作為第一個參數。
151     # output_shape: 預期的函數輸出尺寸。可以是元組或者函數。 如果是元組，它只指定第一個維度；
152     # arguments: 可選的。傳遞給函數function的關鍵字參數。
153 
154     model = Model([model_body.input, *y_true], model_loss)
155     #構建了以圖片數據和圖片標簽（y_true）為輸入，
156     # 模型損失（model_loss）為輸出（y_pred）的模型 model。
157 
158     return model
159 
160 def create_tiny_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
161             weights_path='model_data/tiny_yolo_weights.h5'):
162     '''create the training model, for Tiny YOLOv3'''
163     K.clear_session() # get a new session
164     image_input = Input(shape=(None, None, 3))
165     h, w = input_shape
166     num_anchors = len(anchors)
167 
168     y_true = [Input(shape=(h//{0:32, 1:16}[l], w//{0:32, 1:16}[l], \
169         num_anchors//2, num_classes+5)) for l in range(2)]
170 
171     model_body = tiny_yolo_body(image_input, num_anchors//2, num_classes)
172     print('Create Tiny YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
173 
174     if load_pretrained:
175         model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
176         print('Load weights {}.'.format(weights_path))
177         if freeze_body in [1, 2]:
178             # Freeze the darknet body or freeze all but 2 output layers.
179             num = (20, len(model_body.layers)-2)[freeze_body-1]
180             for i in range(num): model_body.layers[i].trainable = False
181             print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
182 
183     model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
184         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.7})(
185         [*model_body.output, *y_true])
186     model = Model([model_body.input, *y_true], model_loss)
187 
188     return model
189 
190 def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
191 
192     '''data generator for fit_generator
193         annotation_lines：標注數據的行，每行數據包含圖片路徑，和框的位置信息，種類
194         batch_size：每批圖片的大小
195         input_shape： 圖片的輸入尺寸
196         anchors: 大小
197         num_classes： 類別數
198         '''
199 
200     n = len(annotation_lines)
201     i = 0
202     while True:
203         image_data = []
204         box_data = []
205         for b in range(batch_size):
206             if i==0:
207                 np.random.shuffle(annotation_lines)
208             image, box = get_random_data(annotation_lines[i], input_shape, random=True)#從標記的樣本分離image與box，得到樣本圖片與樣本label
209             image_data.append(image)
210             box_data.append(box)
211             i = (i+1) % n
212         image_data = np.array(image_data)
213         box_data = np.array(box_data)
214         y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
215         yield [image_data, *y_true], np.zeros(batch_size)
216 
217 def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes):
218     n = len(annotation_lines)
219     if n==0 or batch_size<=0: return None
220     return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)
221 
222 if __name__ == '__main__':
223     _main()

3.utils.py

3.1 letter_image_box(),此函數的作用主要是將輸入的圖片進行等比例縮小，並在空余地方填成灰色

 1 def letterbox_image(image, size):
 2     '''resize image with unchanged aspect ratio using padding'''
 3     iw, ih = image.size#圖像初始的大小，任意值   以(1000,500)為例
 4     w, h = size #模型要求的(416,416)
 5     scale = min(w/iw, h/ih)#416/1000  0.416<0.832  ,416/500
 6     nw = int(iw*scale) #416/1000*1000=416
 7     nh = int(ih*scale)#416/1000*400=208
 8 
 9     image = image.resize((nw,nh), Image.BICUBIC)
10     new_image = Image.new('RGB', size, (128,128,128))#new : 這個函數創建一幅給定模式（mode）和尺寸（size）的圖片。如果省略 color 參數，則創建的圖片被黑色填充滿，
11                                                      # 如果 color 參數是 None 值，則圖片還沒初始化
12     new_image.paste(image, ((w-nw)//2, (h-nh)//2)) #w-nw=0,(h-nh)//2=(416-208)//2=108
13     return new_image

它的作用如下：

3.2 get_random_data()

此函數的功能主要是進行數據增強與輸入圖像預處理（同letter_image_box）

  1 def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
  2     '''random preprocessing for real-time data augmentation
  3     annotation_lines：標注數據的行，每行數據包含圖片路徑，和框的位置信息，種類
  4     return:imagedata是經過resize並填充的樣本圖片，resize成（416,416），並填充灰度
  5            boxdata是每張image中做的標記label，shpe，對應着truebox，批次數16，最大框數20，每個框5個值，4個邊界點和1個類別序號，如(16, 20, 5)
  6            為（，batchsize，maxbox，5），每張圖片最多的有maxbox個類，5為左上右下的坐標'''
  7     line = annotation_line.split()#刪除空格
  8     image = Image.open(line[0])
  9     iw, ih = image.size
 10     h, w = input_shape#（416,416）
 11     box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
 12 
 13     if not random:
 14         # resize image
 15         #將圖片等比例轉換為416x416的圖片，其余用灰色填充，
 16         # 即(128, 128, 128)，同時顏色值轉換為0~1之間，即每個顏色值除以255；
 17 
 18 
 19         scale = min(w/iw, h/ih)
 20         nw = int(iw*scale)
 21         nh = int(ih*scale)
 22         dx = (w-nw)//2
 23         dy = (h-nh)//2
 24         image_data=0
 25         if proc_img:
 26             image = image.resize((nw,nh), Image.BICUBIC)
 27             new_image = Image.new('RGB', (w,h), (128,128,128))
 28             new_image.paste(image, (dx, dy))
 29             image_data = np.array(new_image)/255.
 30             # 上面的作用和letter_box一致，加了一個把rgb范圍變成0-1
 31             # correct boxes   max_boxes=20
 32 
 33         # correct boxes
 34         # 將邊界框box等比例縮小，再加上填充的偏移量dx和dy，因為新的圖片部分用灰色填充，影
 35         # 響box的坐標系，box最多有max_boxes個，即20個
 36         box_data = np.zeros((max_boxes,5))#shap->(20,5)
 37         if len(box)>0:
 38             np.random.shuffle(box)
 39             if len(box)>max_boxes: box = box[:max_boxes]
 40             box[:, [0,2]] = box[:, [0,2]]*scale + dx
 41             box[:, [1,3]] = box[:, [1,3]]*scale + dy
 42             box_data[:len(box)] = box
 43 
 44         return image_data, box_data
 45 
 46     # resize image
 47     #通過jitter參數，隨機計算new_ar和scale，生成新的nh和nw，
 48     # 將原始圖像隨機轉換為nw和nh尺寸的圖像，即非等比例變換圖像。
 49     #也即是數據增強
 50     new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
 51     scale = rand(.25, 2)
 52     if new_ar < 1:
 53         nh = int(scale*h)
 54         nw = int(nh*new_ar)
 55     else:
 56         nw = int(scale*w)
 57         nh = int(nw/new_ar)
 58     image = image.resize((nw,nh), Image.BICUBIC)
 59 
 60     # place image
 61     dx = int(rand(0, w-nw))
 62     dy = int(rand(0, h-nh))
 63     new_image = Image.new('RGB', (w,h), (128,128,128))
 64     new_image.paste(image, (dx, dy))
 65     image = new_image
 66 
 67     # flip image or not
 68     #根據隨機數flip，隨機左右翻轉FLIP_LEFT_RIGHT圖片
 69     flip = rand()<.5
 70     if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
 71 
 72     # distort image
 73     #在HSV坐標域中，改變圖片的顏色范圍，hue值相加，sat和vat相乘，
 74     # 先由RGB轉為HSV，再由HSV轉為RGB，添加若干錯誤判斷，避免范圍過大
 75     hue = rand(-hue, hue)
 76     sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
 77     val = rand(1, val) if rand()<.5 else 1/rand(1, val)
 78     x = rgb_to_hsv(np.array(image)/255.)
 79     x[..., 0] += hue
 80     x[..., 0][x[..., 0]>1] -= 1
 81     x[..., 0][x[..., 0]<0] += 1
 82     x[..., 1] *= sat
 83     x[..., 2] *= val
 84     x[x>1] = 1
 85     x[x<0] = 0
 86     image_data = hsv_to_rgb(x) # numpy array, 0 to 1
 87 
 88     # correct boxes
 89     #將所有的圖片變換，增加至檢測框中，並且包含若干異常處理，避免變換之后的值過大或過小，去除異常的box
 90     box_data = np.zeros((max_boxes,5))
 91     if len(box)>0:
 92         np.random.shuffle(box)
 93         box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
 94         box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
 95         if flip: box[:, [0,2]] = w - box[:, [2,0]]
 96         box[:, 0:2][box[:, 0:2]<0] = 0
 97         box[:, 2][box[:, 2]>w] = w
 98         box[:, 3][box[:, 3]>h] = h
 99         box_w = box[:, 2] - box[:, 0]
100         box_h = box[:, 3] - box[:, 1]
101         box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
102         if len(box)>max_boxes: box = box[:max_boxes]
103         box_data[:len(box)] = box
104 
105     return image_data, box_data

4.yolo.py()

此函數主要用於檢測圖片或者視頻

  1     def generate(self):
  2         """①加載權重參數文件，生成檢測框，得分，以及對應類別
  3 
  4           ②利用model.py中的yolo_eval函數生成檢測框，得分，所屬類別
  5 
  6           ③初始化時調用generate函數生成圖片的檢測框，得分，所屬類別（self.boxes, self.scores, self.classes）"""
  7         model_path = os.path.expanduser(self.model_path)
  8         assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'
  9 
 10         # Load model, or construct model and load weights.
 11         num_anchors = len(self.anchors)
 12         num_classes = len(self.class_names)
 13         is_tiny_version = num_anchors==6 # default setting
 14         try:
 15             self.yolo_model = load_model(model_path, compile=False)
 16         except:
 17             self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \
 18                 if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
 19             self.yolo_model.load_weights(self.model_path) # make sure model, anchors and classes match
 20         else:
 21             ##[-1]:網絡最后一層輸出。 output_shape[-1]:輸出維度的最后一維。 -> (?,13,13,255)
 22             # 255 = 9/3*(80+5). 9/3:每層特征圖對應3個anchor box  80:80個類別 5:4+1,框的4個值+1個置信度
 23 
 24             assert self.yolo_model.layers[-1].output_shape[-1] == \
 25                 num_anchors/len(self.yolo_model.output) * (num_classes + 5), \
 26                 'Mismatch between model and given anchor and class sizes'
 27             #Python assert（斷言）用於判斷一個表達式，在表達式條件為 false 的時候觸發異常。
 28 
 29             #斷言可以在條件不滿足程序運行的情況下直接返回錯誤，而不必等待程序運行后出現崩潰的情況
 30 
 31         print('{} model, anchors, and classes loaded.'.format(model_path))
 32 
 33         # Generate colors for drawing bounding boxes.
 34         # Generate colors for drawing bounding boxes.
 35         # 生成繪制邊框的顏色。
 36         # h(色調）：x/len(self.class_names)  s(飽和度）：1.0  v(明亮）：1.0
 37 
 38         # 對於80種coco目標，確定每一種目標框的繪制顏色，即：將(x/80, 1.0, 1.0)的顏色轉換為RGB格式，並隨機調整顏色以便於肉眼識別，
 39         # 其中：一個1.0表示飽和度，一個1.0表示亮度
 40 
 41         hsv_tuples = [(x / len(self.class_names), 1., 1.)
 42                       for x in range(len(self.class_names))]
 43         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) #hsv轉換為rgb
 44         # hsv取值范圍在【0,1】，而RBG取值范圍在【0,255】，所以乘上255
 45         self.colors = list(
 46             map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
 47                 self.colors))
 48         np.random.seed(10101)  # Fixed seed for consistent colors across runs.
 49         np.random.shuffle(self.colors)  # Shuffle colors to decorrelate adjacent classes.
 50         np.random.seed(None)  # Reset seed to default.
 51 
 52         # Generate output tensor targets for filtered bounding boxes.
 53         #為過濾的邊界框生成輸出張量目標
 54         self.input_image_shape = K.placeholder(shape=(2, ))
 55         if self.gpu_num>=2:
 56             self.yolo_model = multi_gpu_model(self.yolo_model, gpus=self.gpu_num)
 57         boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors,
 58                 len(self.class_names), self.input_image_shape,
 59                 score_threshold=self.score, iou_threshold=self.iou)
 60         return boxes, scores, classes
 61 
 62     def detect_image(self, image):
 63         """開始計時->①調用letterbox_image函數，即：先生成一個用“絕對灰”R128-G128-B128填充的416×416新圖片，然后用按比例縮放（采樣方式：BICUBIC）后的輸入圖片粘貼，粘貼不到的部分保留為灰色。②model_image_size定義的寬和高必須是32的倍數；若沒有定義model_image_size，將輸入的尺寸調整為32的倍數，並調用letterbox_image函數進行縮放。③將縮放后的圖片數值除以255，做歸一化。④將（416,416,3）數組調整為（1,416,416,3）元祖，滿足網絡輸入的張量格式：image_data。
 64 
 65         ->①運行self.sess.run（）輸入參數：輸入圖片416×416，學習模式0測試/1訓練。
 66         self.yolo_model.input: image_data，self.input_image_shape: [image.size[1], image.size[0]]，
 67         K.learning_phase(): 0。②self.generate（），讀取：model路徑、anchor box、coco類別、加載模型yolo.h5.，對於80中coco目標，確定每一種目標框的繪制顏色，即：將（x/80,1.0,1.0）的顏色轉換為RGB格式，並隨機調整顏色一遍肉眼識別，其中：一個1.0表示飽和度，一個1.0表示亮度。③若GPU>2調用multi_gpu_model()
 68 
 69          ->①yolo_eval(self.yolo_model.output),max_boxes=20,每張圖沒類最多檢測20個框。
 70          ②將anchor_box分為3組，分別分配給三個尺度，yolo_model輸出的feature map
 71          ③特征圖越小，感受野越大，對大目標越敏感，選大的anchor box->
 72          分別對三個feature map運行out_boxes, out_scores, out_classes，返回boxes、scores、classes。
 73          """
 74         start = timer()
 75         # # 調用letterbox_image()函數，即：先生成一個用“絕對灰”R128-G128-B128“填充的416x416新圖片，
 76         # 然后用按比例縮放（采樣方法：BICUBIC）后的輸入圖片粘貼，粘貼不到的部分保留為灰色
 77 
 78         if self.model_image_size != (None, None):  #判斷圖片是否存在
 79             assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
 80             assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
 81             # assert斷言語句的語法格式 model_image_size[0][1]指圖像的w和h，且必須是32的整數倍
 82             boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
 83             # #letterbox_image對圖像調整成輸入尺寸(w,h)
 84         else:
 85             new_image_size = (image.width - (image.width % 32),
 86                               image.height - (image.height % 32))
 87             boxed_image = letterbox_image(image, new_image_size)
 88         image_data = np.array(boxed_image, dtype='float32')
 89 
 90         print(image_data.shape)#（416，416,3）
 91         image_data /= 255.#將縮放后圖片的數值除以255，做歸一化
 92         image_data = np.expand_dims(image_data, 0)  # Add batch dimension.
 93         # 批量添加一維 -> (1,416,416,3) 為了符合網絡的輸入格式 -> (bitch, w, h, c)
 94 
 95         out_boxes, out_scores, out_classes = self.sess.run(
 96             [self.boxes, self.scores, self.classes],
 97             feed_dict={
 98                 self.yolo_model.input: image_data,#圖像數據
 99                 self.input_image_shape: [image.size[1], image.size[0]],#圖像尺寸416x416
100                 K.learning_phase(): 0#學習模式 0：測試模型。 1：訓練模式
101             })#目的為了求boxes,scores,classes，具體計算方式定義在generate（）函數內。在yolo.py第61行
102 
103         print('Found {} boxes for {}'.format(len(out_boxes), 'img'))
104         # 繪制邊框，自動設置邊框寬度，繪制邊框和類別文字，使用Pillow繪圖庫（PIL，頭有聲明）
105         # 設置字體
106 
107 
108         font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
109                     size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
110         # 設置目標框線條的寬度
111         thickness = (image.size[0] + image.size[1]) // 300#厚度
112         ## 對於c個目標類別中的每個目標框i，調用Pillow畫圖
113 
114         for i, c in reversed(list(enumerate(out_classes))):
115             predicted_class = self.class_names[c] #類別  #目標類別的名字
116             box = out_boxes[i]#框
117             score = out_scores[i]#置信度
118 
119             label = '{} {:.2f}'.format(predicted_class, score)
120             draw = ImageDraw.Draw(image)#創建一個可以在給定圖像上繪圖的對象
121             label_size = draw.textsize(label, font)##標簽文字   #返回label的寬和高（多少個pixels）
122             #返回給定字符串的大小，以像素為單位。
123             top, left, bottom, right = box
124             # 目標框的上、左兩個坐標小數點后一位四舍五入
125             """防止檢測框溢出"""
126             top = max(0, np.floor(top + 0.5).astype('int32'))
127 
128             left = max(0, np.floor(left + 0.5).astype('int32'))
129             # 目標框的下、右兩個坐標小數點后一位四舍五入，與圖片的尺寸相比，取最小值
130             # 防止邊框溢出
131             bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
132             right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
133             print(label, (left, top), (right, bottom))
134             # 確定標簽（label）起始點位置：標簽的左、下
135             if top - label_size[1] >= 0:
136                 text_origin = np.array([left, top - label_size[1]])
137             else:
138                 text_origin = np.array([left, top + 1])
139 
140             # My kingdom for a good redistributable image drawing library.
141             # 畫目標框，線條寬度為thickness
142             for i in range(thickness):#畫框
143                 draw.rectangle(
144                     [left + i, top + i, right - i, bottom - i],
145                     outline=self.colors[c])
146                 # 畫標簽框
147             draw.rectangle( #文字背景
148                 [tuple(text_origin), tuple(text_origin + label_size)],
149                 fill=self.colors[c])
150             # 填寫標簽內容
151             draw.text(text_origin, label, fill=(0, 0, 0), font=font)#文案
152             del draw
153 
154         end = timer()
155         print(end - start)
156         return image
157 
158     def close_session(self):
159         self.sess.close()

以上即是主要yolo3的主要部分，下面將會對模型進行測試

5.測試

在理解完原理與上述代碼之后，下面進行測試（當然也可以不用理解源碼也可以直接測試）

（1）首先需要下載yolo3.weights,下載地址：

  https://pjreddie.com/media/files/yolov3.weights
  (2) 在pycharm的終端中輸入python convert.py yolov3.cfg yolov3.weights model_data/yolo_weights.h5
  作用是將yolo3.weights文件轉換成Keras可以處理的.h5權值文件，
（3）隨便在網上下載一張圖片進行測試，比如筆者用一張飛機的照片
（4）在源碼中，不能直接運行yolo.py,因為在此代碼中沒有if__name__=='__main__':
所以需要自己添加：

 1 if __name__ == '__main__':
 2     """測試圖片"""
 3     yolo = YOLO()
 4     path = r'F:\chorme_download\keras-yolo3-master\微信圖片_20200313132254.jpg'
 5     try:
 6         image = Image.open(path)
 7     except:
 8         print('Open Error! Try again!')
 9     else:
10         r_image = yolo.detect_image(image)
11         r_image.show()
12 
13     yolo.close_session()
14     """測試視頻，將detect_video中的path置0即調用自己電腦的攝像頭"""
15     yolo=YOLO()
16     detect_video(yolo,0)

6.結果

本文為原創，制作不易，轉載請標明出處，謝謝！！！
原文鏈接：https://www.cnblogs.com/hujinzhou/p/guobao_2020_3_13.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【源碼解讀】YOLO v3 訓練 - 02 網絡結構 Pytorch從0開始實現YOLO V3指南 part2——搭建網絡結構層網絡結構解讀之inception系列四：Inception V3 網絡結構解讀之inception系列四：Inception V3 【源碼解讀】YOLO v3 - 06 測試深度學習之 YOLO v1,v2,v3詳解 CRNN網絡結構詳解 YOLO-V4源碼詳解 YOLO V3 YOLO V3 原理