（5）YOLOv3源码

本文转载自查看原文 2020-07-04 16:51 803 陈飞宇学YOLO

YOLOv3使用了FPN的结构，所以很希望看一下他的训练样本构造部分，源码来自于https://github.com/wizyoung/YOLOv3_TensorFlow，先看一下结构：

看一下训练文件train.py：

这里不像我们以往把image和gt_box设置占位符而是直接设置了一个迭代器的占位符:

 1 ##################
 2 # tf.data pipeline
 3 ##################
 4 train_dataset = tf.data.TextLineDataset(args.train_file)                                       #train_file是一个记录训练数据的文件，以此构建一个dataset
 5 train_dataset = train_dataset.shuffle(args.train_img_cnt)                                      #将训练集打乱
 6 train_dataset = train_dataset.batch(args.batch_size)                                           #设置batchsize
 7 train_dataset = train_dataset.map(                                                             #通过map函数使用get_batch_data处理数据
 8      lambda x: tf.py_func(get_batch_data,  9                           inp=[x, args.class_num, args.img_size, args.anchors, 'train', 10  args.multi_scale_train, args.use_mix_up, args.letterbox_resize], 11                           Tout=[tf.int64, tf.float32, tf.float32, tf.float32, tf.float32]), 12      num_parallel_calls=args.num_threads 13 ) 14 train_dataset = train_dataset.prefetch(args.prefetech_buffer)                                  #设置使用prefetch的缓存大小
15 iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes) #构建一个迭代器iterator 16 train_init_op = iterator.make_initializer(train_dataset) #设置使用train_dataset初始化方式 17 image_ids, image, y_true_13, y_true_26, y_true_52 = iterator.get_next()                        #利用iterator获得一个batchsize的数据

这种方式在我的随笔里面有介绍，这种数据传输效率更高，它使用了多线程map和prefetch，详细可以看一下。

我们首先看一次啊训练数据怎么在train_file里记录：

这里没有在使用pascal_voc类，而是使用了脚本parse_voc_xml.py处理成上面的信息：

最重要的我们看一下map函数里的get_batch_data操作：

parse_line就是从一行string里解析出数据返回具体的信息，主要看一下两张图像如何融合的mix_up:

图像融合后是这个样子：

最终的img，boxes是这样的：

来看一下对图像进行数据增强的过程：

数据增强部分很细节，打算写个随笔记录一下，可以看一下。这里看得到训练标签的process_box函数：

 1 def process_box(boxes, labels, img_size, class_num, anchors):  2     '''
 3  Generate the y_true label, i.e. the ground truth feature_maps in 3 different scales.  4  params:  5  boxes: [N, 5] shape, float32 dtype. `x_min, y_min, x_max, y_mix, mixup_weight`.  6  labels: [N] shape, int64 dtype.  7  class_num: int64 num.  8  anchors: [9, 4] shape, float32 dtype.  9     '''
10     anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] #这是9个anchor对应的不同feature_map，在feature_map[0]shape(13*13)对应anchor为[6,7,8],这是尺寸最大的anchor 11 
12     # convert boxes form:
13     # shape: [N, 2]
14     # (x_center, y_center)
15     box_centers = (boxes[:, 0:2] + boxes[:, 2:4]) / 2     #计算gt_box的中心（center_x,center_y）
16     # (width, height)
17     box_sizes = boxes[:, 2:4] - boxes[:, 0:2] #计算gt_box的尺寸（w,h） 18 
19     # [13, 13, 3, 5+num_class+1] `5` means coords and labels. `1` means mix up weight. 
20     y_true_13 = np.zeros((img_size[1] // 32, img_size[0] // 32, 3, 6 + class_num), np.float32)#初始化feature_map[0],ratio=32， 21     y_true_26 = np.zeros((img_size[1] // 16, img_size[0] // 16, 3, 6 + class_num), np.float32)#初始化feature_map[1],ratio=16 22     y_true_52 = np.zeros((img_size[1] // 8, img_size[0] // 8, 3, 6 + class_num), np.float32) #初始化feature_map[2],ratio=8 23 
24     # mix up weight default to 1.
25     y_true_13[..., -1] = 1. #把每个weight初始化为1 26     y_true_26[..., -1] = 1. 27     y_true_52[..., -1] = 1. 28 
29     y_true = [y_true_13, y_true_26, y_true_52] #拼起来就是最后的标签 30 
31     # [N, 1, 2]
32     box_sizes = np.expand_dims(box_sizes, 1) #下面这一段通过gt_box的w、h和9个anchor的w、h计算两者的iou，匹配最大的iou对应的anchor的index（0-8） 33     # broadcast tricks
34     # [N, 1, 2] & [9, 2] ==> [N, 9, 2]
35     mins = np.maximum(- box_sizes / 2, - anchors / 2) 36     maxs = np.minimum(box_sizes / 2, anchors / 2) 37     # [N, 9, 2]
38     whs = maxs - mins 39 
40     # [N, 9]
41     iou = (whs[:, :, 0] * whs[:, :, 1]) / ( 42                 box_sizes[:, :, 0] * box_sizes[:, :, 1] + anchors[:, 0] * anchors[:, 1] - whs[:, :, 0] * whs[:, :, 43                                                                                                          1] + 1e-10) 44     # [N]
45     best_match_idx = np.argmax(iou, axis=1) #为每个gt_box找到最合适的anchor索引（0-9） 46 
47     ratio_dict = {1.: 8., 2.: 16., 3.: 32.} #记录每个feature_map对应缩放的比例 48     for i, idx in enumerate(best_match_idx): 49         # idx: 0,1,2 ==> 2; 3,4,5 ==> 1; 6,7,8 ==> 0
50         feature_map_group = 2 - idx // 3    #根据anchor的index映射为feature_map的index
51         # scale ratio: 0,1,2 ==> 8; 3,4,5 ==> 16; 6,7,8 ==> 32
52         ratio = ratio_dict[np.ceil((idx + 1) / 3.)] #查找当前feature_map对应的缩放比例 53         x = int(np.floor(box_centers[i, 0] / ratio)) 54         y = int(np.floor(box_centers[i, 1] / ratio)) #gt_box在该缩放feature_map下对应的cell（x,y） 55         k = anchors_mask[feature_map_group].index(idx)#查找该anchor的index对应该特征图下的第几个（0-2） 56         c = labels[i] #该gt_box对应的class 57         # print(feature_map_group, '|', y,x,k,c)
58 
59         y_true[feature_map_group][y, x, k, :2] = box_centers[i] #更新该gt_box对应的feature_map 60         y_true[feature_map_group][y, x, k, 2:4] = box_sizes[i] 61         y_true[feature_map_group][y, x, k, 4] = 1. 62         y_true[feature_map_group][y, x, k, 5 + c] = 1. 63         y_true[feature_map_group][y, x, k, -1] = boxes[i, -1] 64 
65     return y_true_13, y_true_26, y_true_52