SSD-tensorflow源碼閱讀


之前寫的一篇SSD論文學習筆記因為沒保存丟掉了,然后不想重新寫,直接進行下一步吧。SSD延續了yolo系列的思路,引入了Faster-RCNN anchor的概念。不同特征層采樣,多anchor. SSD源碼閱讀 https://github.com/balancap/SSD-Tensorflow

ssd_vgg_300.py為主要程序。其中ssd_net函數為定義網絡結構。先簡單解釋下SSD是如何提取feature map的。如下圖,利用VGG-16,采用多尺度提取,提取不同卷積層的特征網絡。一般為6個,層數大小分別為conv4 ==> 64 x 64,conv7 ==> 32 x 32,conv8 ==> 16 x 16,conv9 ==> 8 x 8,conv10 ==> 4 x 4,conv11 ==> 2 x 2,conv12 ==> 1 x 1。

 

 

 1 ###定義網絡結構,將不同卷積層存儲在end_points中。此部分用了tensorflow.slim模塊,類似於keras
end_points = {} 2 with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse): 3 # Original VGG-16 blocks. 4 net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1') 5 end_points['block1'] = net 6 net = slim.max_pool2d(net, [2, 2], scope='pool1') 7 # Block 2. 8 net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2') 9 end_points['block2'] = net 10 net = slim.max_pool2d(net, [2, 2], scope='pool2') 11 # Block 3. 12 net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3') 13 end_points['block3'] = net 14 net = slim.max_pool2d(net, [2, 2], scope='pool3') 15 # Block 4. 16 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4') 17 end_points['block4'] = net 18 net = slim.max_pool2d(net, [2, 2], scope='pool4') 19 # Block 5. 20 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5') 21 end_points['block5'] = net 22 net = slim.max_pool2d(net, [3, 3], stride=1, scope='pool5') 23 24 # Additional SSD blocks. 25 # Block 6: let's dilate the hell out of it! 26 net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6') 27 end_points['block6'] = net 28 net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training) 29 # Block 7: 1x1 conv. Because the fuck. 30 net = slim.conv2d(net, 1024, [1, 1], scope='conv7') 31 end_points['block7'] = net 32 net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training) 33 34 # Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts). 35 end_point = 'block8' 36 with tf.variable_scope(end_point): 37 net = slim.conv2d(net, 256, [1, 1], scope='conv1x1') 38 net = custom_layers.pad2d(net, pad=(1, 1)) 39 net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID') 40 end_points[end_point] = net 41 end_point = 'block9' 42 with tf.variable_scope(end_point): 43 net = slim.conv2d(net, 128, [1, 1], scope='conv1x1') 44 net = custom_layers.pad2d(net, pad=(1, 1)) 45 net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID') 46 end_points[end_point] = net 47 end_point = 'block10' 48 with tf.variable_scope(end_point): 49 net = slim.conv2d(net, 128, [1, 1], scope='conv1x1') 50 net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID') 51 end_points[end_point] = net 52 end_point = 'block11' 53 with tf.variable_scope(end_point): 54 net = slim.conv2d(net, 128, [1, 1], scope='conv1x1') 55 net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID') 56 end_points[end_point] = net

接下來ssd_multibox_layer 函數為按每一層feature map('block4', 'block7', 'block8', 'block9', 'block10', 'block11')生成不同的anchor進行預測。源碼中生成anchor方式與前面所述不太一樣。論文中方式提取網絡后在不同feature map設置不同大小的anchor,基准的size大小計算方式為,k為不同的特征層取值,比conv4是k為1.Smax=0.9,Smin為0.2. 每個feature map,以基准SIZE生成4-6個不同比例的anchor,比例分別為{1,2,3,1/2,1/3},其中比例為1時,size為Sk*Sk+1。以輸入為300X300尺寸,conv4層的feature map為例。S1=0.2*300=60,選取的比例分別為{1,2,1/2,1‘’}。不同anchor的w分別為{60,60*1.42,60*0.7,112.5}. 但實際函數中不是按這種方法來計算的。接下來分析源碼中的計算方式。源碼中直接給出了每一層的大小及比例。此函數作用為提取feature map生成預測的位置及類別。此項涉及到提取的feature map數據流通方式。此函數中有兩條路線,經過一次batchnorm和卷積,生成類別信息(21*num_anchor*w*h)及位置信息的預測。實際應有三條線?分別生成代碼如下:

 1 def ssd_multibox_layer(inputs,
 2                        num_classes,
 3                        sizes,
 4                        ratios=[1],
 5                        normalization=-1,
 6                        bn_normalization=False):
 7     """Construct a multibox layer, return a class and localization predictions.
 8     """
 9     net = inputs
10     if normalization > 0:
11         net = custom_layers.l2_normalization(net, scaling=True)
12     # Number of anchors.
13     num_anchors = len(sizes) + len(ratios) ###4~6,兩個sizes代表例為1:1的,sizes代表其他比例的anchor,整體代表一個feature map有幾個anchor
14 
15     # Location. 對位置進行預測
16     num_loc_pred = num_anchors * 4
17     loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None,
18                            scope='conv_loc')
19     loc_pred = custom_layers.channel_to_last(loc_pred)
20     loc_pred = tf.reshape(loc_pred,
21                           tensor_shape(loc_pred, 4)[:-1]+[num_anchors, 4])
22     # Class prediction. 對類別進行預測
23     num_cls_pred = num_anchors * num_classes
24     cls_pred = slim.conv2d(net, num_cls_pred, [3, 3], activation_fn=None,
25                            scope='conv_cls')
26     cls_pred = custom_layers.channel_to_last(cls_pred)
27     cls_pred = tf.reshape(cls_pred,
28                           tensor_shape(cls_pred, 4)[:-1]+[num_anchors, num_classes])
29     return cls_pred, loc_pred  ###生成每個feature map每個anchor的預測

接下來是利用上式結果生成默認的anchor. 

 1 def ssd_anchor_one_layer(img_shape,
 2                          feat_shape,
 3                          sizes,
 4                          ratios,
 5                          step,
 6                          offset=0.5,
 7                          dtype=np.float32):
 8     ##函數作用:生成每一層feature map的不同方格的不同anchor的中心坐標和w,h並返回
 9     ##生成每層feature map中每個小方框的中心坐標位置 *step/img_shape結果為在原圖中相對位置
10     y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
11     y = (y.astype(dtype) + offset) * step / img_shape[0]
12     x = (x.astype(dtype) + offset) * step / img_shape[1]
13 
14     # Expand dims to support easy broadcasting.
15     y = np.expand_dims(y, axis=-1)
16     x = np.expand_dims(x, axis=-1)
17 
18     # Compute relative height and width.
19     # Tries to follow the original implementation of SSD for the order.
20     ###每個feature map的每個小方格,有4-6個anchor,這4-6個anchor比例不同,分別為{1,2,3,1/2,1/3}。但是同一個feature map的不同小方格,對應的anchor
21     ####w,h是相通的
22     num_anchors = len(sizes) + len(ratios)  ###anchor個數
23     h = np.zeros((num_anchors, ), dtype=dtype)
24     w = np.zeros((num_anchors, ), dtype=dtype)
25     # Add first anchor boxes with ratio=1. 1:1的anchor的w,h
26     h[0] = sizes[0] / img_shape[0]
27     w[0] = sizes[0] / img_shape[1]
28     di = 1
29     if len(sizes) > 1: ###另外一個1:1的anchor的w,h
30         h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
31         w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
32         di += 1
33     for i, r in enumerate(ratios): ####其他比例的anchor的w,h比如{2,3,1/2,1/3}計算方式已寫
34         h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
35         w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)
36     return y, x, h, w   
37 
38 
39 def ssd_anchors_all_layers(img_shape,
40                            layers_shape,
41                            anchor_sizes,
42                            anchor_ratios,
43                              anchor_steps,
44                            offset=0.5,
45                            dtype=np.float32):
46     """Compute anchor boxes for all feature layers.
47     生成不同層feature map的anchor並返回
48     """
49     layers_anchors = []
50     for i, s in enumerate(layers_shape):
51         anchor_bboxes = ssd_anchor_one_layer(img_shape, s,
52                                              anchor_sizes[i],
53                                              anchor_ratios[i],
54                                              anchor_steps[i],
55                                              offset=offset, dtype=dtype)
56         layers_anchors.append(anchor_bboxes)
57     return layers_anchors

上面通過網絡生成了預測的anchor坐標接下來便是ground Truth的處理,用到的函數主要為tf_ssd_bboxes_encode_layer。此函數的作用是對每一層feature map的預測框進行處理,去除掉不滿足要求的預測框(即設為0),同時對滿足要求的預測框找出與真實框的對應關系。

  1 def tf_ssd_bboxes_encode_layer(labels,
  2                                bboxes,
  3                                anchors_layer,
  4                                num_classes,
  5                                no_annotation_label,
  6                                ignore_threshold=0.5,
  7                                prior_scaling=[0.1, 0.1, 0.2, 0.2],
  8                                dtype=tf.float32):
  9     """Encode  groundtruth labels and bounding boxes using SSD anchors from
 10     one layer.
 11 
 12     Arguments:
 13       labels: 1D Tensor(int64) containing groundtruth labels;
 14       bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
 15       anchors_layer: Numpy array with layer anchors;
 16       matching_threshold: Threshold for positive match with groundtruth bboxes;
 17       prior_scaling: Scaling of encoded coordinates.
 18 
 19     Return:
 20       (target_labels, target_localizations, target_scores): Target Tensors.
 21     """
 22     # Anchors coordinates and volume.
 23     yref, xref, href, wref = anchors_layer ###固定生成的anchor的中心坐標及w,h等
 24     ymin = yref - href / 2.
 25     xmin = xref - wref / 2.
 26     ymax = yref + href / 2.
 27     xmax = xref + wref / 2.
 28     vol_anchors = (xmax - xmin) * (ymax - ymin) ###預測框四個角的坐標及面積
 29 
 30     # Initialize tensors...
 31     shape = (yref.shape[0], yref.shape[1], href.size) ###S*S*(4-6)
 32     feat_labels = tf.zeros(shape, dtype=tf.int64) ##每個預測框的標簽
 33     feat_scores = tf.zeros(shape, dtype=dtype)##每個預測框的得分
 34     ###每個預測框四個點的坐標
 35     feat_ymin = tf.zeros(shape, dtype=dtype)
 36     feat_xmin = tf.zeros(shape, dtype=dtype)
 37     feat_ymax = tf.ones(shape, dtype=dtype)
 38     feat_xmax = tf.ones(shape, dtype=dtype)
 39     ####計算預測框與真實框的IOU ,box為真實框的坐標
 40     def jaccard_with_anchors(bbox):
 41         """Compute jaccard score between a box and the anchors.
 42         """
 43         int_ymin = tf.maximum(ymin, bbox[0])
 44         int_xmin = tf.maximum(xmin, bbox[1])
 45         int_ymax = tf.minimum(ymax, bbox[2])
 46         int_xmax = tf.minimum(xmax, bbox[3])
 47         h = tf.maximum(int_ymax - int_ymin, 0.)
 48         w = tf.maximum(int_xmax - int_xmin, 0.)
 49         # Volumes.
 50         inter_vol = h * w
 51         union_vol = vol_anchors - inter_vol \
 52             + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
 53         jaccard = tf.div(inter_vol, union_vol)
 54         return jaccard
 55     ####score得分即為重疊部分/預測框面積
 56     def intersection_with_anchors(bbox):
 57         """Compute intersection between score a box and the anchors.
 58         """
 59         int_ymin = tf.maximum(ymin, bbox[0])
 60         int_xmin = tf.maximum(xmin, bbox[1])
 61         int_ymax = tf.minimum(ymax, bbox[2])
 62         int_xmax = tf.minimum(xmax, bbox[3])
 63         h = tf.maximum(int_ymax - int_ymin, 0.)
 64         w = tf.maximum(int_xmax - int_xmin, 0.)
 65         inter_vol = h * w
 66         scores = tf.div(inter_vol, vol_anchors)
 67         return scores
 68 
 69     def condition(i, feat_labels, feat_scores,
 70                   feat_ymin, feat_xmin, feat_ymax, feat_xmax):
 71         """Condition: check label index.
 72         """
 73         r = tf.less(i, tf.shape(labels))
 74         return r[0]
 75 
 76     def body(i, feat_labels, feat_scores,
 77              feat_ymin, feat_xmin, feat_ymax, feat_xmax):
 78         """Body: update feature labels, scores and bboxes.
 79         Follow the original SSD paper for that purpose:
 80           - assign values when jaccard > 0.5;
 81           - only update if beat the score of other bboxes.
 82         """
 83         # Jaccard score.
 84         label = labels[i]
 85         bbox = bboxes[i]
 86         jaccard = jaccard_with_anchors(bbox)
 87         # Mask: check threshold + scores + no annotations + num_classes.
 88         mask = tf.greater(jaccard, feat_scores)
 89         # mask = tf.logical_and(mask, tf.greater(jaccard, matching_threshold))
 90         mask = tf.logical_and(mask, feat_scores > -0.5)
 91         mask = tf.logical_and(mask, label < num_classes) ####邏輯判斷,那些項IOU大於閾值
 92         imask = tf.cast(mask, tf.int64)
 93         fmask = tf.cast(mask, dtype)
 94         # Update values using mask.更新那些滿足要求的預測框,使他們類別,四個點的坐標位置和置信度分別為真實框的值,否則為0
 95         feat_labels = imask * label + (1 - imask) * feat_labels
 96         feat_scores = tf.where(mask, jaccard, feat_scores)
 97 
 98         feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
 99         feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
100         feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
101         feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
102 
103         # Check no annotation label: ignore these anchors...
104         # interscts = intersection_with_anchors(bbox)
105         # mask = tf.logical_and(interscts > ignore_threshold,
106         #                       label == no_annotation_label)
107         # # Replace scores by -1.
108         # feat_scores = tf.where(mask, -tf.cast(mask, dtype), feat_scores)
109 
110         return [i+1, feat_labels, feat_scores,
111                 feat_ymin, feat_xmin, feat_ymax, feat_xmax]
112     # Main loop definition.
113     i = 0
114     [i, feat_labels, feat_scores,
115      feat_ymin, feat_xmin,
116      feat_ymax, feat_xmax] = tf.while_loop(condition, body,
117                                            [i, feat_labels, feat_scores,
118                                             feat_ymin, feat_xmin,
119                                             feat_ymax, feat_xmax])
120     # Transform to center / size.
121     feat_cy = (feat_ymax + feat_ymin) / 2.
122     feat_cx = (feat_xmax + feat_xmin) / 2.
123     feat_h = feat_ymax - feat_ymin
124     feat_w = feat_xmax - feat_xmin
125     # Encode features.
126     feat_cy = (feat_cy - yref) / href / prior_scaling[0]
127     feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
128     feat_h = tf.log(feat_h / href) / prior_scaling[2]
129     feat_w = tf.log(feat_w / wref) / prior_scaling[3]
130     # Use SSD ordering: x / y / w / h instead of ours.  此處返回的不是坐標值,而是偏差值。此處與SSD不同
131     feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)
132     return feat_labels, feat_localizations, feat_scores

接下來便是最重要的部分,即損失函數源碼閱讀。損失函數在論文中定義如下

分為類別置信度偏差和坐標位移偏差。上式已經有進過網絡的的提取的值及經過groundTruth處理后的值,現在把兩者結合,進行loss計算。主要的函數為ssd_losses。

 1 def ssd_losses(logits, localisations,
 2                gclasses, glocalisations, gscores,
 3                match_threshold=0.5,
 4                negative_ratio=3.,
 5                alpha=1.,
 6                label_smoothing=0.,
 7                device='/cpu:0',
 8                scope=None):
 9     with tf.name_scope(scope, 'ssd_losses'):
10         lshape = tfe.get_shape(logits[0], 5)
11         num_classes = lshape[-1]
12         batch_size = lshape[0]
13 
14         # Flatten out all vectors! 對預測框與groundTruth分別進行reshape,然后組合
15         flogits = []
16         fgclasses = []
17         fgscores = []
18         flocalisations = []
19         fglocalisations = []
20         for i in range(len(logits)):
21             flogits.append(tf.reshape(logits[i], [-1, num_classes]))
22             fgclasses.append(tf.reshape(gclasses[i], [-1]))
23             fgscores.append(tf.reshape(gscores[i], [-1]))
24             flocalisations.append(tf.reshape(localisations[i], [-1, 4]))
25             fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4]))
26         # And concat the crap!
27         logits = tf.concat(flogits, axis=0)
28         gclasses = tf.concat(fgclasses, axis=0)
29         gscores = tf.concat(fgscores, axis=0)
30         localisations = tf.concat(flocalisations, axis=0)
31         glocalisations = tf.concat(fglocalisations, axis=0)
32         dtype = logits.dtype
33 
34         # Compute positive matching mask...
35         ###篩選IOU>0.5的預測框
36         pmask = gscores > match_threshold
37         fpmask = tf.cast(pmask, dtype)
38         n_positives = tf.reduce_sum(fpmask)
39 
40         # Hard negative mining...
41         ###對於IOU《0.5的歸為負類,即背景,預測項為第0項
42         no_classes = tf.cast(pmask, tf.int32)
43         predictions = slim.softmax(logits)
44         nmask = tf.logical_and(tf.logical_not(pmask),
45                                gscores > -0.5)
46         fnmask = tf.cast(nmask, dtype)
47         nvalues = tf.where(nmask,
48                            predictions[:, 0],
49                            1. - fnmask)
50         nvalues_flat = tf.reshape(nvalues, [-1])
51         # Number of negative entries to select.
52         ###負類最大比例為正類的3倍
53         max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
54         n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size
55         n_neg = tf.minimum(n_neg, max_neg_entries)
56 
57         val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
58         max_hard_pred = -val[-1]
59         # Final negative mask.
60         nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
61         fnmask = tf.cast(nmask, dtype)
62 
63         # Add cross-entropy loss.正類和負類的類別損失函數計算方式不同,主要是因為標簽不一樣
64         with tf.name_scope('cross_entropy_pos'):
65             loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
66                                                                   labels=gclasses)
67             loss = tf.div(tf.reduce_sum(loss * fpmask), batch_size, name='value')
68             tf.losses.add_loss(loss)
69 
70         with tf.name_scope('cross_entropy_neg'):
71             loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
72                                                                   labels=no_classes)
73             loss = tf.div(tf.reduce_sum(loss * fnmask), batch_size, name='value')
74             tf.losses.add_loss(loss)
75 
76         # Add localization loss: smooth L1, L2, ...
77         with tf.name_scope('localization'): ###預測預測損失函數
78             # Weights Tensor: positive mask + random negative.
79             weights = tf.expand_dims(alpha * fpmask, axis=-1)
80             loss = custom_layers.abs_smooth(localisations - glocalisations)
81             loss = tf.div(tf.reduce_sum(loss * weights), batch_size, name='value')
82             tf.losses.add_loss(loss)  ###最終的loss

最后一部分就是前面的圖像處理及預測之后的圖像處理函數了。ssd_vgg_preprocessing.py是對訓練或者預測圖像進行預處理。就是圖像增強這類的工作。

ssd_common.py中tf_ssd_bboxes_decode_layer 函數是對預測后的坐標進行處理,在圖像中標出預測框的位置。而np_methods.py中基本是對預測框進行篩選,nms等,找出最合適的預測框

 

 1 def tf_ssd_bboxes_decode_layer(feat_localizations,
 2                                anchors_layer,
 3                                prior_scaling=[0.1, 0.1, 0.2, 0.2]):
 4     """Compute the relative bounding boxes from the layer features and
 5     reference anchor bounding boxes.
 6 
 7     Arguments:
 8       feat_localizations: Tensor containing localization features.
 9       anchors: List of numpy array containing anchor boxes.
10 
11     Return:
12       Tensor Nx4: ymin, xmin, ymax, xmax
13     """
14     yref, xref, href, wref = anchors_layer
15 
16     # Compute center, height and width 基本就是前面處理坐標的逆向過程。anchores_layer為不同anchor的坐標,
17     # feat_locations為預測框的偏差,反過來可以倒推預測框的坐標
18     cx = feat_localizations[:, :, :, :, 0] * wref * prior_scaling[0] + xref
19     cy = feat_localizations[:, :, :, :, 1] * href * prior_scaling[1] + yref
20     w = wref * tf.exp(feat_localizations[:, :, :, :, 2] * prior_scaling[2])
21     h = href * tf.exp(feat_localizations[:, :, :, :, 3] * prior_scaling[3])
22     # Boxes coordinates.
23     ymin = cy - h / 2.
24     xmin = cx - w / 2.
25     ymax = cy + h / 2.
26     xmax = cx + w / 2.
27     bboxes = tf.stack([ymin, xmin, ymax, xmax], axis=-1)
28     return bboxes

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM