（5）YOLOv3源碼

本文轉載自查看原文 2020-07-04 16:51 803 陳飛宇學YOLO

YOLOv3使用了FPN的結構，所以很希望看一下他的訓練樣本構造部分，源碼來自於https://github.com/wizyoung/YOLOv3_TensorFlow，先看一下結構：

看一下訓練文件train.py：

這里不像我們以往把image和gt_box設置占位符而是直接設置了一個迭代器的占位符:

 1 ##################
 2 # tf.data pipeline
 3 ##################
 4 train_dataset = tf.data.TextLineDataset(args.train_file)                                       #train_file是一個記錄訓練數據的文件，以此構建一個dataset
 5 train_dataset = train_dataset.shuffle(args.train_img_cnt)                                      #將訓練集打亂
 6 train_dataset = train_dataset.batch(args.batch_size)                                           #設置batchsize
 7 train_dataset = train_dataset.map(                                                             #通過map函數使用get_batch_data處理數據
 8      lambda x: tf.py_func(get_batch_data,  9                           inp=[x, args.class_num, args.img_size, args.anchors, 'train', 10  args.multi_scale_train, args.use_mix_up, args.letterbox_resize], 11                           Tout=[tf.int64, tf.float32, tf.float32, tf.float32, tf.float32]), 12      num_parallel_calls=args.num_threads 13 ) 14 train_dataset = train_dataset.prefetch(args.prefetech_buffer)                                  #設置使用prefetch的緩存大小
15 iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes) #構建一個迭代器iterator 16 train_init_op = iterator.make_initializer(train_dataset) #設置使用train_dataset初始化方式 17 image_ids, image, y_true_13, y_true_26, y_true_52 = iterator.get_next()                        #利用iterator獲得一個batchsize的數據

這種方式在我的隨筆里面有介紹，這種數據傳輸效率更高，它使用了多線程map和prefetch，詳細可以看一下。

我們首先看一次啊訓練數據怎么在train_file里記錄：

這里沒有在使用pascal_voc類，而是使用了腳本parse_voc_xml.py處理成上面的信息：

最重要的我們看一下map函數里的get_batch_data操作：

parse_line就是從一行string里解析出數據返回具體的信息，主要看一下兩張圖像如何融合的mix_up:

圖像融合后是這個樣子：

最終的img，boxes是這樣的：

來看一下對圖像進行數據增強的過程：

數據增強部分很細節，打算寫個隨筆記錄一下，可以看一下。這里看得到訓練標簽的process_box函數：

 1 def process_box(boxes, labels, img_size, class_num, anchors):  2     '''
 3  Generate the y_true label, i.e. the ground truth feature_maps in 3 different scales.  4  params:  5  boxes: [N, 5] shape, float32 dtype. `x_min, y_min, x_max, y_mix, mixup_weight`.  6  labels: [N] shape, int64 dtype.  7  class_num: int64 num.  8  anchors: [9, 4] shape, float32 dtype.  9     '''
10     anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] #這是9個anchor對應的不同feature_map，在feature_map[0]shape(13*13)對應anchor為[6,7,8],這是尺寸最大的anchor 11 
12     # convert boxes form:
13     # shape: [N, 2]
14     # (x_center, y_center)
15     box_centers = (boxes[:, 0:2] + boxes[:, 2:4]) / 2     #計算gt_box的中心（center_x,center_y）
16     # (width, height)
17     box_sizes = boxes[:, 2:4] - boxes[:, 0:2] #計算gt_box的尺寸（w,h） 18 
19     # [13, 13, 3, 5+num_class+1] `5` means coords and labels. `1` means mix up weight. 
20     y_true_13 = np.zeros((img_size[1] // 32, img_size[0] // 32, 3, 6 + class_num), np.float32)#初始化feature_map[0],ratio=32， 21     y_true_26 = np.zeros((img_size[1] // 16, img_size[0] // 16, 3, 6 + class_num), np.float32)#初始化feature_map[1],ratio=16 22     y_true_52 = np.zeros((img_size[1] // 8, img_size[0] // 8, 3, 6 + class_num), np.float32) #初始化feature_map[2],ratio=8 23 
24     # mix up weight default to 1.
25     y_true_13[..., -1] = 1. #把每個weight初始化為1 26     y_true_26[..., -1] = 1. 27     y_true_52[..., -1] = 1. 28 
29     y_true = [y_true_13, y_true_26, y_true_52] #拼起來就是最后的標簽 30 
31     # [N, 1, 2]
32     box_sizes = np.expand_dims(box_sizes, 1) #下面這一段通過gt_box的w、h和9個anchor的w、h計算兩者的iou，匹配最大的iou對應的anchor的index（0-8） 33     # broadcast tricks
34     # [N, 1, 2] & [9, 2] ==> [N, 9, 2]
35     mins = np.maximum(- box_sizes / 2, - anchors / 2) 36     maxs = np.minimum(box_sizes / 2, anchors / 2) 37     # [N, 9, 2]
38     whs = maxs - mins 39 
40     # [N, 9]
41     iou = (whs[:, :, 0] * whs[:, :, 1]) / ( 42                 box_sizes[:, :, 0] * box_sizes[:, :, 1] + anchors[:, 0] * anchors[:, 1] - whs[:, :, 0] * whs[:, :, 43                                                                                                          1] + 1e-10) 44     # [N]
45     best_match_idx = np.argmax(iou, axis=1) #為每個gt_box找到最合適的anchor索引（0-9） 46 
47     ratio_dict = {1.: 8., 2.: 16., 3.: 32.} #記錄每個feature_map對應縮放的比例 48     for i, idx in enumerate(best_match_idx): 49         # idx: 0,1,2 ==> 2; 3,4,5 ==> 1; 6,7,8 ==> 0
50         feature_map_group = 2 - idx // 3    #根據anchor的index映射為feature_map的index
51         # scale ratio: 0,1,2 ==> 8; 3,4,5 ==> 16; 6,7,8 ==> 32
52         ratio = ratio_dict[np.ceil((idx + 1) / 3.)] #查找當前feature_map對應的縮放比例 53         x = int(np.floor(box_centers[i, 0] / ratio)) 54         y = int(np.floor(box_centers[i, 1] / ratio)) #gt_box在該縮放feature_map下對應的cell（x,y） 55         k = anchors_mask[feature_map_group].index(idx)#查找該anchor的index對應該特征圖下的第幾個（0-2） 56         c = labels[i] #該gt_box對應的class 57         # print(feature_map_group, '|', y,x,k,c)
58 
59         y_true[feature_map_group][y, x, k, :2] = box_centers[i] #更新該gt_box對應的feature_map 60         y_true[feature_map_group][y, x, k, 2:4] = box_sizes[i] 61         y_true[feature_map_group][y, x, k, 4] = 1. 62         y_true[feature_map_group][y, x, k, 5 + c] = 1. 63         y_true[feature_map_group][y, x, k, -1] = boxes[i, -1] 64 
65     return y_true_13, y_true_26, y_true_52