（原）faster rcnn的tensorflow代碼的理解

本文轉載自查看原文 2018-11-30 19:29 10209 DeepLearning/ algorithm

轉載請注明出處：

https://www.cnblogs.com/darkknightzh/p/10043864.html

參考網址：

tf的第三方faster rcnn：https://github.com/endernewton/tf-faster-rcnn

IOU：https://www.cnblogs.com/darkknightzh/p/9043395.html

faster rcnn主要包括兩部分：rpn網絡和rcnn網絡。rpn網絡用於保留在圖像內部的archors，同時得到這些archors是正樣本還是負樣本還是不關注。最終訓練時通過nms保留最多2000個archors，測試時保留300個archors。另一方面，rpn網絡會提供256個archors給rcnn網絡，用於rcnn分類及回歸坐標位置。

Network為基類，vgg16為派生類，重載了Network中的_image_to_head和_head_to_tail。

下面只針對vgg16進行分析。

faster rcnn網絡總體結構如下圖所示。

1. 訓練階段：

SolverWrapper通過construct_graph創建網絡、train_op等。

construct_graph通過Network的create_architecture創建網絡。

1.1 create_architecture

create_architecture通過_build_network具體創建網絡模型、損失及其他相關操作，得到rois, cls_prob, bbox_pred，定義如下

 1 def create_architecture(self, mode, num_classes, tag=None, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
 2     self._image = tf.placeholder(tf.float32, shape=[1, None, None, 3])   # 由於圖像寬高不定，因而第二維和第三維都是None
 3     self._im_info = tf.placeholder(tf.float32, shape=[3])        # 圖像信息，高、寬、縮放到寬為600或者高為1000的最小比例
 4     self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])   # ground truth框的信息。前四個為位置信息，最后一個為該框對應的類別（見roi_data_layer/minibatch.py/get_minibatch）
 5     self._tag = tag
 6 
 7     self._num_classes = num_classes
 8     self._mode = mode
 9     self._anchor_scales = anchor_scales
10     self._num_scales = len(anchor_scales)
11 
12     self._anchor_ratios = anchor_ratios
13     self._num_ratios = len(anchor_ratios)
14 
15     self._num_anchors = self._num_scales * self._num_ratios    # self._num_anchors=9
16 
17     training = mode == 'TRAIN'
18     testing = mode == 'TEST'
19 
20     weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)  # handle most of the regularizers here
21     if cfg.TRAIN.BIAS_DECAY:
22         biases_regularizer = weights_regularizer
23     else:
24         biases_regularizer = tf.no_regularizer
25 
26     # list as many types of layers as possible, even if they are not used now
27     with arg_scope([slim.conv2d, slim.conv2d_in_plane, slim.conv2d_transpose, slim.separable_conv2d, slim.fully_connected],
28                    weights_regularizer=weights_regularizer, biases_regularizer=biases_regularizer, biases_initializer=tf.constant_initializer(0.0)):
29         # rois：256個archors的類別（訓練時為每個archors的類別，測試時全0）
30         # cls_prob：256個archors每一類別的概率
31         # bbox_pred：預測位置信息的偏移
32         rois, cls_prob, bbox_pred = self._build_network(training)
33 
34     layers_to_output = {'rois': rois}
35 
36     for var in tf.trainable_variables():
37         self._train_summaries.append(var)
38 
39     if testing:
40         stds = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (self._num_classes))
41         means = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (self._num_classes))
42         self._predictions["bbox_pred"] *= stds   # 訓練時_region_proposal中預測的位置偏移減均值除標准差，因而測試時需要反過來。
43         self._predictions["bbox_pred"] += means
44     else:
45         self._add_losses()
46         layers_to_output.update(self._losses)
47 
48         val_summaries = []
49         with tf.device("/cpu:0"):
50             val_summaries.append(self._add_gt_image_summary())
51             for key, var in self._event_summaries.items():
52                 val_summaries.append(tf.summary.scalar(key, var))
53             for key, var in self._score_summaries.items():
54                 self._add_score_summary(key, var)
55             for var in self._act_summaries:
56                 self._add_act_summary(var)
57             for var in self._train_summaries:
58                 self._add_train_summary(var)
59 
60         self._summary_op = tf.summary.merge_all()
61         self._summary_op_val = tf.summary.merge(val_summaries)
62 
63     layers_to_output.update(self._predictions)
64 
65     return layers_to_output

View Code

1.2 _build_network

_build_network用於創建網絡

_build_network = _image_to_head + //得到輸入圖像的特征

_anchor_component + //得到所有可能的archors在原始圖像中的坐標（可能超出圖像邊界）及archors的數量

_region_proposal + //對輸入特征進行處理，最終得到2000個archors（訓練）或300個archors（測試）

_crop_pool_layer + //將256個archors裁剪出來，並縮放到7*7的固定大小，得到特征

_head_to_tail + //將256個archors的特征增加fc及dropout，得到4096維的特征

_region_classification // 增加fc層及dropout層，用於rcnn的分類及回歸

總體流程：網絡通過vgg1-5得到特征net_conv后，送入rpn網絡得到候選區域archors，去除超出圖像邊界的archors並選出2000個archors用於訓練rpn網絡（300個用於測試）。並進一步選擇256個archors（用於rcnn分類）。之后將這256個archors的特征根據rois進行裁剪縮放及pooling，得到相同大小7*7的特征pool5，pool5通過兩個fc層得到4096維特征fc7，fc7送入_region_classification（2個並列的fc層），得到21維的cls_score和21*4維的bbox_pred。

_build_network定義如下

 1 def _build_network(self, is_training=True):
 2     if cfg.TRAIN.TRUNCATED:  # select initializers
 3         initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
 4         initializer_bbox = tf.truncated_normal_initializer(mean=0.0, stddev=0.001)
 5     else:
 6         initializer = tf.random_normal_initializer(mean=0.0, stddev=0.01)
 7         initializer_bbox = tf.random_normal_initializer(mean=0.0, stddev=0.001)
 8 
 9     net_conv = self._image_to_head(is_training)   # 得到vgg16的conv5_3
10     with tf.variable_scope(self._scope, self._scope):
11         self._anchor_component()  # 通過特征圖及相對原始圖像的縮放倍數_feat_stride得到所有archors的起點及終點坐標
12         rois = self._region_proposal(net_conv, is_training, initializer)  # 通過rpn網絡，得到256個archors的類別（訓練時為每個archors的類別，測試時全0）及位置（后四維）
13         pool5 = self._crop_pool_layer(net_conv, rois, "pool5")  # 對特征圖通過rois得到候選區域，並對候選區域進行縮放，得到14*14的固定大小，進一步pooling成7*7大小
14 
15     fc7 = self._head_to_tail(pool5, is_training)  # 對固定大小的rois增加fc及dropout，得到4096維的特征，用於分類及回歸
16     with tf.variable_scope(self._scope, self._scope):
17         cls_prob, bbox_pred = self._region_classification(fc7, is_training, initializer, initializer_bbox)  # 對rois進行分類，完成目標檢測；進行回歸，得到預測坐標
18 
19     self._score_summaries.update(self._predictions)
20 
21     # rois：256個archors的類別（訓練時為每個archors的類別，測試時全0）
22     # cls_prob：256個archors每一類別的概率
23     # bbox_pred：預測位置信息的偏移
24     return rois, cls_prob, bbox_pred

View Code

1.3 _image_to_head

_image_to_head用於得到輸入圖像的特征

該函數位於vgg16.py中，定義如下

 1 def _image_to_head(self, is_training, reuse=None):
 2     with tf.variable_scope(self._scope, self._scope, reuse=reuse):
 3         net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3], trainable=False, scope='conv1')
 4         net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1')
 5         net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], trainable=False, scope='conv2')
 6         net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2')
 7         net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], trainable=is_training, scope='conv3')
 8         net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3')
 9         net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv4')
10         net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4')
11         net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5')
12 
13     self._act_summaries.append(net)
14     self._layers['head'] = net
15 
16     return net

View Code

1.4 _anchor_component

_anchor_component：用於得到所有可能的archors在原始圖像中的坐標（可能超出圖像邊界）及archors的數量（特征圖寬*特征圖高*9）。該函數使用的self._im_info，為一個3維向量，[0]代表圖像高，[1]代表圖像寬（感謝carrot359提醒，之前寬高寫反了），[2]代表圖像縮放的比例（將圖像寬縮放到600，或高縮放到1000的最小比例，比如縮放到600*900、850*1000）。該函數調用generate_anchors_pre_tf並進一步調用generate_anchors來得到所有可能的archors在原始圖像中的坐標及archors的個數（由於圖像大小不一樣，因而最終archor的個數也不一樣）。

generate_anchors_pre_tf步驟如下：

1. 通過_ratio_enum得到archor時，使用 (0, 0, 15, 15) 的基准窗口，先通過ratio=[0.5,1,2]的比例得到archors。ratio指的是像素總數（寬*高）的比例，而不是寬或者高的比例，得到如下三個archor（每個archor為左上角和右下角的坐標）：

2. 而后在通過scales=(8, 16, 32)得到放大倍數后的archors。scales時，將上面的每個都直接放大對應的倍數，最終得到9個archors（每個archor為左上角和右下角的坐標）。將上面三個archors分別放大就行了，因而本文未給出該圖。

之后通過tf.add(anchor_constant, shifts)得到縮放后的每個點的9個archor在原始圖的矩形框。anchor_constant：1*9*4。shifts：N*1*4。N為縮放后特征圖的像素數。將維度從N*9*4變換到(N*9)*4，得到縮放后的圖像每個點在原始圖像中的archors。

_anchor_component如下：

 1 def _anchor_component(self):
 2     with tf.variable_scope('ANCHOR_' + self._tag) as scope:
 3         height = tf.to_int32(tf.ceil(self._im_info[0] / np.float32(self._feat_stride[0])))  # 圖像經過vgg16得到特征圖的寬高
 4         width = tf.to_int32(tf.ceil(self._im_info[1] / np.float32(self._feat_stride[0])))
 5         if cfg.USE_E2E_TF:
 6             # 通過特征圖寬高、_feat_stride（特征圖相對原始圖縮小的比例）及_anchor_scales、_anchor_ratios得到原始圖像上
 7             # 所有可能的archors（坐標可能超出原始圖像邊界）和archor的數量
 8             anchors, anchor_length = generate_anchors_pre_tf(height, width, self._feat_stride, self._anchor_scales, self._anchor_ratios )
 9         else:
10             anchors, anchor_length = tf.py_func(generate_anchors_pre,
11                 [height, width, self._feat_stride, self._anchor_scales, self._anchor_ratios], [tf.float32, tf.int32], name="generate_anchors")
12         anchors.set_shape([None, 4])   # 起點坐標，終點坐標，共4個值
13         anchor_length.set_shape([])
14         self._anchors = anchors
15         self._anchor_length = anchor_length
16 
17 def generate_anchors_pre_tf(height, width, feat_stride=16, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
18     shift_x = tf.range(width) * feat_stride  # 得到所有archors在原始圖像的起始x坐標：(0,feat_stride,2*feat_stride...)
19     shift_y = tf.range(height) * feat_stride  # 得到所有archors在原始圖像的起始y坐標：(0,feat_stride,2*feat_stride...)
20     shift_x, shift_y = tf.meshgrid(shift_x, shift_y) # shift_x：height個(0,feat_stride,2*feat_stride...);shift_y：width個(0,feat_stride,2*feat_stride...)'
21     sx = tf.reshape(shift_x, shape=(-1,)) # 0,feat_stride,2*feat_stride...0,feat_stride,2*feat_stride...0,feat_stride,2*feat_stride...
22     sy = tf.reshape(shift_y, shape=(-1,)) # 0,0,0...feat_stride,feat_stride,feat_stride...2*feat_stride,2*feat_stride,2*feat_stride..
23     shifts = tf.transpose(tf.stack([sx, sy, sx, sy])) # width*height個四位矩陣
24     K = tf.multiply(width, height)  # 特征圖總共像素數
25     shifts = tf.transpose(tf.reshape(shifts, shape=[1, K, 4]), perm=(1, 0, 2)) # 增加一維，變成1*(width*height)*4矩陣，而后變換維度為(width*height)*1*4矩陣
26 
27     anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales))  #得到9個archors的在原始圖像中的四個坐標（放大比例默認為16）
28     A = anchors.shape[0]   # A=9
29     anchor_constant = tf.constant(anchors.reshape((1, A, 4)), dtype=tf.int32) # anchors增加維度為1*9*4
30 
31     length = K * A  # 總共的archors的個數（每個點對應A=9個archor，共K=height*width個點）
32     # 1*9*4的base archors和(width*height)*1*4的偏移矩陣進行broadcast相加，得到(width*height)*9*4，並改變形狀為(width*height*9)*4，得到所有的archors的四個坐標
33     anchors_tf = tf.reshape(tf.add(anchor_constant, shifts), shape=(length, 4))
34 
35     return tf.cast(anchors_tf, dtype=tf.float32), length
36 
37 def generate_anchors(base_size=16, ratios=[0.5, 1, 2], scales=2 ** np.arange(3, 6)):
38     """Generate anchor (reference) windows by enumerating aspect ratios X scales wrt a reference (0, 0, 15, 15) window."""
39     base_anchor = np.array([1, 1, base_size, base_size]) - 1  # base archor的四個坐標
40     ratio_anchors = _ratio_enum(base_anchor, ratios)  # 通過ratio得到3個archors的坐標（3*4矩陣）
41     anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales) for i in range(ratio_anchors.shape[0])]) # 3*4矩陣變成9*4矩陣，得到9個archors的坐標
42     return anchors
43 
44 
45 def _whctrs(anchor):
46     """ Return width, height, x center, and y center for an anchor (window). """
47     w = anchor[2] - anchor[0] + 1  # 寬
48     h = anchor[3] - anchor[1] + 1  # 高
49     x_ctr = anchor[0] + 0.5 * (w - 1)  # 中心x
50     y_ctr = anchor[1] + 0.5 * (h - 1)  # 中心y
51     return w, h, x_ctr, y_ctr
52 
53 
54 def _mkanchors(ws, hs, x_ctr, y_ctr):
55     """ Given a vector of widths (ws) and heights (hs) around a center (x_ctr, y_ctr), output a set of anchors (windows)."""
56     ws = ws[:, np.newaxis]  # 3維向量變成3*1矩陣
57     hs = hs[:, np.newaxis]  # 3維向量變成3*1矩陣
58     anchors = np.hstack((x_ctr - 0.5 * (ws - 1), y_ctr - 0.5 * (hs - 1), x_ctr + 0.5 * (ws - 1), y_ctr + 0.5 * (hs - 1)))  # 3*4矩陣
59     return anchors
60 
61 
62 def _ratio_enum(anchor, ratios):  # 縮放比例為像素總數的比例，而非單獨寬或者高的比例
63     """ Enumerate a set of anchors for each aspect ratio wrt an anchor. """
64     w, h, x_ctr, y_ctr = _whctrs(anchor)  # 得到中心位置和寬高
65     size = w * h    # 總共像素數
66     size_ratios = size / ratios  # 縮放比例
67     ws = np.round(np.sqrt(size_ratios))  # 縮放后的寬,3維向量(值由大到小)
68     hs = np.round(ws * ratios)     # 縮放后的高，兩個3維向量對應元素相乘，為3維向量（值由小到大）
69     anchors = _mkanchors(ws, hs, x_ctr, y_ctr)  # 根據中心及寬高得到3個archors的四個坐標
70     return anchors
71 
72 
73 def _scale_enum(anchor, scales):
74     """ Enumerate a set of anchors for each scale wrt an anchor. """
75     w, h, x_ctr, y_ctr = _whctrs(anchor)    # 得到中心位置和寬高
76     ws = w * scales    # 得到寬的放大倍數
77     hs = h * scales    # 得到寬的放大倍數
78     anchors = _mkanchors(ws, hs, x_ctr, y_ctr)  # 根據中心及寬高得到3個archors的四個坐標
79     return anchors

View Code

1.5 _region_proposal

_region_proposal用於將vgg16的conv5的特征通過3*3的滑動窗得到rpn特征，進行兩條並行的線路，分別送入cls和reg網絡。cls網絡判斷通過1*1的卷積得到archors是正樣本還是負樣本（由於archors過多，還有可能有不關心的archors，使用時只使用正樣本和負樣本），用於二分類rpn_cls_score；reg網絡對通過1*1的卷積回歸出archors的坐標偏移rpn_bbox_pred。這兩個網絡共用3*3 conv（rpn）。由於每個位置有k個archor，因而每個位置均有2k個soores和4k個coordinates。

cls（將輸入的512維降低到2k維）：3*3 conv + 1*1 conv（2k個scores，k為每個位置archors個數，如9）

在第一次使用_reshape_layer時，由於輸入bottom為1*？*？*2k，先得到caffe中的數據順序（tf為batchsize*height*width*channels，caffe中為batchsize*channels*height*width）to_caffe：1*2k*？*？，而后reshape后得到reshaped為1*2*？*？，最后在轉回tf的順序to_tf為1*？*？*2，得到rpn_cls_score_reshape。之后通過rpn_cls_prob_reshape（softmax的值，只針對最后一維，即2計算softmax），得到概率rpn_cls_prob_reshape（其最大值，即為預測值rpn_cls_pred），再次_reshape_layer，得到1*？*？*2k的rpn_cls_prob，為原始的概率。

reg（將輸入的512維降低到4k維）：3*3 conv + 1*1 conv（4k個coordinates，k為每個位置archors個數，如9）。

_region_proposal定義如下：

 1 def _region_proposal(self, net_conv, is_training, initializer):  # 對輸入特征圖進行處理
 2     rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")  #3*3的conv，作為rpn網絡
 3     self._act_summaries.append(rpn)
 4     rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer,  # _num_anchors為9
 5                                 padding='VALID', activation_fn=None, scope='rpn_cls_score')    #1*1的conv，得到每個位置的9個archors分類特征1*？*？*(9*2)（二分類），判斷當前archors是正樣本還是負樣本
 6     rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape') # 1*？*？*18==>1*(?*9)*?*2
 7     rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")  # 以最后一維為特征長度，得到所有特征的概率1*(?*9)*?*2
 8     rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred")  # 得到每個位置的9個archors預測的類別，(1*?*9*?)的列向量
 9     rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")  # 變換會原始維度1*(?*9)*?*2==>1*?*?*(9*2)
10     rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer,
11                                 padding='VALID', activation_fn=None, scope='rpn_bbox_pred')    #1*1的conv，每個位置的9個archors回歸位置偏移1*？*？*(9*4)
12     if is_training:
13         # 每個位置的9個archors的類別概率和每個位置的9個archors的回歸位置偏移得到post_nms_topN=2000個archors的位置（包括全0的batch_inds）及為1的概率
14         rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
15         rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")   # rpn_labels：特征圖中每個位置對應的是正樣本、負樣本還是不關注
16         with tf.control_dependencies([rpn_labels]):  # Try to have a deterministic order for the computing graph, for reproducibility
17             rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")  #通過post_nms_topN個archors的位置及為1（正樣本）的概率得到256個rois（第一列的全0更新為每個archors對應的類別）及對應信息
18     else:
19         if cfg.TEST.MODE == 'nms':
20             # 每個位置的9個archors的類別概率和每個位置的9個archors的回歸位置偏移得到post_nms_topN=300個archors的位置（包括全0的batch_inds）及為1的概率
21             rois, _ = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
22         elif cfg.TEST.MODE == 'top':
23             rois, _ = self._proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
24         else:
25             raise NotImplementedError
26 
27     self._predictions["rpn_cls_score"] = rpn_cls_score  # 每個位置的9個archors是正樣本還是負樣本
28     self._predictions["rpn_cls_score_reshape"] = rpn_cls_score_reshape  # 每個archors是正樣本還是負樣本
29     self._predictions["rpn_cls_prob"] = rpn_cls_prob   # 每個位置的9個archors是正樣本和負樣本的概率
30     self._predictions["rpn_cls_pred"] = rpn_cls_pred   # 每個位置的9個archors預測的類別，(1*?*9*?)的列向量
31     self._predictions["rpn_bbox_pred"] = rpn_bbox_pred  # 每個位置的9個archors回歸位置偏移
32     self._predictions["rois"] = rois   # 256個archors的類別（第一維）及位置（后四維）
33 
34     return rois  # 返回256個archors的類別（第一維，訓練時為每個archors的類別，測試時全0）及位置（后四維）
35 
36 def _reshape_layer(self, bottom, num_dim, name):
37     input_shape = tf.shape(bottom)
38     with tf.variable_scope(name) as scope:
39         to_caffe = tf.transpose(bottom, [0, 3, 1, 2])  # NHWC（TF數據格式）變成NCHW（caffe格式）
40         reshaped = tf.reshape(to_caffe, tf.concat(axis=0, values=[[1, num_dim, -1], [input_shape[2]]]))  # 1*(num_dim*9)*?*?==>1*num_dim*(9*?)*?  或 1*num_dim*(9*?)*?==>1*(num_dim*9)*?*?
41         to_tf = tf.transpose(reshaped, [0, 2, 3, 1])
42         return to_tf
43 
44 
45 def _softmax_layer(self, bottom, name):
46     if name.startswith('rpn_cls_prob_reshape'):    # bottom：1*(?*9)*?*2
47         input_shape = tf.shape(bottom)
48         bottom_reshaped = tf.reshape(bottom, [-1, input_shape[-1]])   # 只保留最后一維，用於計算softmax的概率，其他的全合並：1*(?*9)*?*2==>(1*?*9*?)*2
49         reshaped_score = tf.nn.softmax(bottom_reshaped, name=name)  # 得到所有特征的概率
50         return tf.reshape(reshaped_score, input_shape)   # (1*?*9*?)*2==>1*(?*9)*?*2
51     return tf.nn.softmax(bottom, name=name)

View Code

1.6 _proposal_layer

_proposal_layer調用proposal_layer_tf，通過(N*9)*4個archors，計算估計后的坐標（bbox_transform_inv_tf），並對坐標進行裁剪（clip_boxes_tf）及非極大值抑制（tf.image.non_max_suppression，可得到符合條件的索引indices）的archors：rois及這些archors為正樣本的概率：rpn_scores。rois為m*5維，rpn_scores為m*4維，其中m為經過非極大值抑制后得到的候選區域個數（訓練時2000個，測試時300個）。m*5的第一列為全為0的batch_inds，后4列為坐標（坐上+右下）

_proposal_layer如下

 1 def _proposal_layer(self, rpn_cls_prob, rpn_bbox_pred, name):  #每個位置的9個archors的類別概率和每個位置的9個archors的回歸位置偏移得到post_nms_topN個archors的位置及為1的概率
 2     with tf.variable_scope(name) as scope:
 3         if cfg.USE_E2E_TF:  # post_nms_topN*5的rois（第一列為全0的batch_inds，后4列為坐標）；rpn_scores：post_nms_topN*1個對應的為1的概率
 4             rois, rpn_scores = proposal_layer_tf(rpn_cls_prob, rpn_bbox_pred, self._im_info, self._mode, self._feat_stride, self._anchors, self._num_anchors)
 5         else:
 6             rois, rpn_scores = tf.py_func(proposal_layer, [rpn_cls_prob, rpn_bbox_pred, self._im_info, self._mode,
 7                 self._feat_stride, self._anchors, self._num_anchors], [tf.float32, tf.float32], name="proposal")
 8 
 9         rois.set_shape([None, 5])
10         rpn_scores.set_shape([None, 1])
11 
12     return rois, rpn_scores
13 
14 def proposal_layer_tf(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchors, num_anchors):  #每個位置的9個archors的類別概率和每個位置的9個archors的回歸位置偏移
15     if type(cfg_key) == bytes:
16         cfg_key = cfg_key.decode('utf-8')
17     pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
18     post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N  # 訓練時為2000，測試時為300
19     nms_thresh = cfg[cfg_key].RPN_NMS_THRESH   # nms的閾值，為0.7
20 
21     scores = rpn_cls_prob[:, :, :, num_anchors:]    # 1*?*?*(9*2)取后9個：1*?*?*9。應該是前9個代表9個archors為背景景的概率，后9個代表9個archors為前景的概率（二分類，只有背景和前景）
22     scores = tf.reshape(scores, shape=(-1,))        # 所有的archors為1的概率
23     rpn_bbox_pred = tf.reshape(rpn_bbox_pred, shape=(-1, 4))     # 所有的archors的四個坐標
24 
25     proposals = bbox_transform_inv_tf(anchors, rpn_bbox_pred)   # 已知archor和偏移求預測的坐標
26     proposals = clip_boxes_tf(proposals, im_info[:2])    # 限制預測坐標在原始圖像上
27 
28     indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)    # 通過nms得到分值最大的post_nms_topN個坐標的索引
29 
30     boxes = tf.gather(proposals, indices)   # 得到post_nms_topN個對應的坐標
31     boxes = tf.to_float(boxes)
32     scores = tf.gather(scores, indices)    # 得到post_nms_topN個對應的為1的概率
33     scores = tf.reshape(scores, shape=(-1, 1))
34 
35     batch_inds = tf.zeros((tf.shape(indices)[0], 1), dtype=tf.float32)    # Only support single image as input
36     blob = tf.concat([batch_inds, boxes], 1)  # post_nms_topN*1個batch_inds和post_nms_topN*4個坐標concat，得到post_nms_topN*5的blob
37 
38     return blob, scores
39 
40 def bbox_transform_inv_tf(boxes, deltas):    # 已知archor和偏移求預測的坐標
41     boxes = tf.cast(boxes, deltas.dtype)
42     widths = tf.subtract(boxes[:, 2], boxes[:, 0]) + 1.0     # 寬
43     heights = tf.subtract(boxes[:, 3], boxes[:, 1]) + 1.0     # 高
44     ctr_x = tf.add(boxes[:, 0], widths * 0.5)             # 中心x
45     ctr_y = tf.add(boxes[:, 1], heights * 0.5)            # 中心y
46 
47     dx = deltas[:, 0]      # 預測的dx
48     dy = deltas[:, 1]      # 預測的dy
49     dw = deltas[:, 2]      # 預測的dw
50     dh = deltas[:, 3]      # 預測的dh
51 
52     pred_ctr_x = tf.add(tf.multiply(dx, widths), ctr_x)      # 公式2已知xa，wa，tx反過來求預測的x中心坐標
53     pred_ctr_y = tf.add(tf.multiply(dy, heights), ctr_y)     # 公式2已知ya，ha，ty反過來求預測的y中心坐標
54     pred_w = tf.multiply(tf.exp(dw), widths)         # 公式2已知wa，tw反過來求預測的w
55     pred_h = tf.multiply(tf.exp(dh), heights)        # 公式2已知ha，th反過來求預測的h
56 
57     pred_boxes0 = tf.subtract(pred_ctr_x, pred_w * 0.5)  # 預測的框的起始和終點四個坐標
58     pred_boxes1 = tf.subtract(pred_ctr_y, pred_h * 0.5)
59     pred_boxes2 = tf.add(pred_ctr_x, pred_w * 0.5)
60     pred_boxes3 = tf.add(pred_ctr_y, pred_h * 0.5)
61 
62     return tf.stack([pred_boxes0, pred_boxes1, pred_boxes2, pred_boxes3], axis=1)
63 
64 
65 def clip_boxes_tf(boxes, im_info):   # 限制預測坐標在原始圖像上
66     b0 = tf.maximum(tf.minimum(boxes[:, 0], im_info[1] - 1), 0)
67     b1 = tf.maximum(tf.minimum(boxes[:, 1], im_info[0] - 1), 0)
68     b2 = tf.maximum(tf.minimum(boxes[:, 2], im_info[1] - 1), 0)
69     b3 = tf.maximum(tf.minimum(boxes[:, 3], im_info[0] - 1), 0)
70     return tf.stack([b0, b1, b2, b3], axis=1)

View Code

1.7 _anchor_target_layer

通過_anchor_target_layer首先去除archors中邊界超出圖像的archors。而后通過bbox_overlaps計算archors（N*4）和gt_boxes（M*4）的重疊區域的值overlaps（N*M），並得到每個archor對應的最大的重疊ground_truth的值max_overlaps（1*N），以及ground_truth的背景對應的最大重疊archors的值gt_max_overlaps（1*M）和每個背景對應的archor的位置gt_argmax_overlaps。之后通過_compute_targets計算anchors和最大重疊位置的gt_boxes的變換后的坐標bbox_targets（見公式2后四個）。最后通過_unmap在變換回和原始的archors一樣大小的rpn_labels（archors是正樣本、負樣本還是不關注），rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights。

_anchor_target_layer定義：

  1 def _anchor_target_layer(self, rpn_cls_score, name):  # rpn_cls_score:每個位置的9個archors分類特征1*？*？*(9*2)
  2     with tf.variable_scope(name) as scope:
  3         # rpn_labels; 特征圖中每個位置對應的是正樣本、負樣本還是不關注（去除了邊界在圖像外面的archors）
  4         # rpn_bbox_targets:# 特征圖中每個位置和對應的正樣本的坐標偏移（很多為0）
  5         # rpn_bbox_inside_weights:  正樣本的權重為1（去除負樣本和不關注的樣本，均為0）
  6         # rpn_bbox_outside_weights:  正樣本和負樣本（不包括不關注的樣本）歸一化的權重
  7         rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
  8             anchor_target_layer, [rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
  9             [tf.float32, tf.float32, tf.float32, tf.float32], name="anchor_target")
 10 
 11         rpn_labels.set_shape([1, 1, None, None])
 12         rpn_bbox_targets.set_shape([1, None, None, self._num_anchors * 4])
 13         rpn_bbox_inside_weights.set_shape([1, None, None, self._num_anchors * 4])
 14         rpn_bbox_outside_weights.set_shape([1, None, None, self._num_anchors * 4])
 15 
 16         rpn_labels = tf.to_int32(rpn_labels, name="to_int32")
 17         self._anchor_targets['rpn_labels'] = rpn_labels  # 特征圖中每個位置對應的是正樣本、負樣本還是不關注（去除了邊界在圖像外面的archors）
 18         self._anchor_targets['rpn_bbox_targets'] = rpn_bbox_targets  # 特征圖中每個位置和對應的正樣本的坐標偏移（很多為0）
 19         self._anchor_targets['rpn_bbox_inside_weights'] = rpn_bbox_inside_weights  #  正樣本的權重為1（去除負樣本和不關注的樣本，均為0）
 20         self._anchor_targets['rpn_bbox_outside_weights'] = rpn_bbox_outside_weights  #   正樣本和負樣本（不包括不關注的樣本）歸一化的權重
 21 
 22         self._score_summaries.update(self._anchor_targets)
 23 
 24     return rpn_labels
 25  
 26 def anchor_target_layer(rpn_cls_score, gt_boxes, im_info, _feat_stride, all_anchors, num_anchors):# 1*？*？*(9*2); ?*5; 3; [16], ?*4; [9]
 27     """Same as the anchor target layer in original Fast/er RCNN """
 28     A = num_anchors   # [9]
 29     total_anchors = all_anchors.shape[0]   # 所有archors的個數，9*特征圖寬*特征圖高 個
 30     K = total_anchors / num_anchors
 31 
 32     _allowed_border = 0  # allow boxes to sit over the edge by a small amount
 33     height, width = rpn_cls_score.shape[1:3]  # rpn網絡得到的特征的高寬
 34 
 35     inds_inside = np.where(  # 所有archors邊界可能超出圖像，取在圖像內部的archors的索引
 36         (all_anchors[:, 0] >= -_allowed_border) & (all_anchors[:, 1] >= -_allowed_border) &
 37         (all_anchors[:, 2] < im_info[1] + _allowed_border) &  # width
 38         (all_anchors[:, 3] < im_info[0] + _allowed_border)  # height
 39         )[0]
 40 
 41     anchors = all_anchors[inds_inside, :]   # 得到在圖像內部archors的坐標
 42 
 43     labels = np.empty((len(inds_inside),), dtype=np.float32)  # label: 1 正樣本, 0 負樣本, -1 不關注
 44     labels.fill(-1)
 45 
 46     # 計算每個anchors:n*4和每個真實位置gt_boxes:m*4的重疊區域的比的矩陣:n*m
 47     overlaps = bbox_overlaps(np.ascontiguousarray(anchors, dtype=np.float), np.ascontiguousarray(gt_boxes, dtype=np.float))
 48     argmax_overlaps = overlaps.argmax(axis=1)  # 找到每行最大值的位置，即每個archors對應的正樣本的位置，得到n維的行向量
 49     max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]  # 取出每個archors對應的正樣本的重疊區域，n維向量
 50     gt_argmax_overlaps = overlaps.argmax(axis=0)  # 找到每列最大值的位置，即每個真實位置對應的archors的位置，得到m維的行向量
 51     gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]  # 取出每個真實位置對應的archors的重疊區域，m維向量
 52     gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]  # 得到從小到大順序的位置
 53 
 54     if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:   # assign bg labels first so that positive labels can clobber them first set the negatives
 55         labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0  # 將archors對應的正樣本的重疊區域中小於閾值的置0
 56 
 57     labels[gt_argmax_overlaps] = 1  # fg label: for each gt, anchor with highest overlap 每個真實位置對應的archors置1
 58     labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 # fg label: above threshold IOU 將archors對應的正樣本的重疊區域中大於閾值的置1
 59 
 60     if cfg.TRAIN.RPN_CLOBBER_POSITIVES:  # assign bg labels last so that negative labels can clobber positives
 61         labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
 62 
 63     # 如果有過多的正樣本，則只隨機選擇num_fg=0.5*256=128個正樣本
 64     num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)  # subsample positive labels if we have too many
 65     fg_inds = np.where(labels == 1)[0]
 66     if len(fg_inds) > num_fg:
 67         disable_inds = npr.choice(fg_inds, size=(len(fg_inds) - num_fg), replace=False)
 68         labels[disable_inds] = -1   # 將多於的正樣本設置為不關注
 69 
 70     # 如果有過多的負樣本，則只隨機選擇 num_bg=256-正樣本個數 個負樣本
 71     num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)  # subsample negative labels if we have too many
 72     bg_inds = np.where(labels == 0)[0]
 73     if len(bg_inds) > num_bg:
 74         disable_inds = npr.choice(bg_inds, size=(len(bg_inds) - num_bg), replace=False)
 75         labels[disable_inds] = -1   # 將多於的負樣本設置為不關注
 76 
 77     bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)
 78     bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])  # 通過archors和archors對應的正樣本計算坐標的偏移
 79 
 80     bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
 81     bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)  # 正樣本的四個坐標的權重均設置為1
 82 
 83     bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
 84     if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:  # uniform weighting of examples (given non-uniform sampling)
 85         num_examples = np.sum(labels >= 0)   # 正樣本和負樣本的總數（去除不關注的樣本）
 86         positive_weights = np.ones((1, 4)) * 1.0 / num_examples   # 歸一化的權重
 87         negative_weights = np.ones((1, 4)) * 1.0 / num_examples   # 歸一化的權重
 88     else:
 89         assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) & (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
 90         positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT / np.sum(labels == 1))
 91         negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) / np.sum(labels == 0))
 92     bbox_outside_weights[labels == 1, :] = positive_weights     # 歸一化的權重
 93     bbox_outside_weights[labels == 0, :] = negative_weights     # 歸一化的權重
 94 
 95     # 由於上面使用了inds_inside，此處將labels，bbox_targets，bbox_inside_weights，bbox_outside_weights映射到原始的archors（包含未知
 96     # 參數超出圖像邊界的archors）對應的labels，bbox_targets，bbox_inside_weights，bbox_outside_weights，同時將不需要的填充fill的值
 97     labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
 98     bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
 99     bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)  # 所有archors中正樣本的四個坐標的權重均設置為1，其他為0
100     bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
101 
102     labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)   # (1*？*？)*9==>1*？*？*9==>1*9*？*？
103     labels = labels.reshape((1, 1, A * height, width))  # 1*9*？*？==>1*1*(9*？)*？
104     rpn_labels = labels  # 特征圖中每個位置對應的是正樣本、負樣本還是不關注（去除了邊界在圖像外面的archors）
105 
106     bbox_targets = bbox_targets.reshape((1, height, width, A * 4))  # 1*(9*？)*？*4==>1*？*？*(9*4)
107 
108     rpn_bbox_targets = bbox_targets  # 特征圖中每個位置和對應的正樣本的坐標偏移（很多為0）
109     bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A * 4))  # 1*(9*？)*？*4==>1*？*？*(9*4)
110     rpn_bbox_inside_weights = bbox_inside_weights
111     bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A * 4))  # 1*(9*？)*？*4==>1*？*？*(9*4)
112     rpn_bbox_outside_weights = bbox_outside_weights    #   歸一化的權重
113     return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
114 
115 
116 def _unmap(data, count, inds, fill=0):
117     """ Unmap a subset of item (data) back to the original set of items (of size count) """
118     if len(data.shape) == 1:
119         ret = np.empty((count,), dtype=np.float32)   # 得到1維矩陣
120         ret.fill(fill)   # 默認填充fill的值
121         ret[inds] = data   # 有效位置填充具體數據
122     else:
123         ret = np.empty((count,) + data.shape[1:], dtype=np.float32)  # 得到對應維數的矩陣
124         ret.fill(fill)    # 默認填充fill的值
125         ret[inds, :] = data   # 有效位置填充具體數據
126     return ret
127 
128 
129 def _compute_targets(ex_rois, gt_rois):
130     """Compute bounding-box regression targets for an image."""
131     assert ex_rois.shape[0] == gt_rois.shape[0]
132     assert ex_rois.shape[1] == 4
133     assert gt_rois.shape[1] == 5
134 
135     # 通過公式2后四個，結合archor和對應的正樣本的坐標計算坐標的偏移
136     return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)  # 由於gt_rois是5列，去掉最后一列的batch_inds
137 
138 def bbox_transform(ex_rois, gt_rois):
139     ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0  # archor的寬
140     ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0  # archor的高
141     ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths  #archor的中心x
142     ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights  #archor的中心y
143 
144     gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0  # 真實正樣本w
145     gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0   # 真實正樣本h
146     gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths      # 真實正樣本中心x
147     gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights     # 真實正樣本中心y
148 
149     targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths    # 通過公式2后四個的x*，xa，wa得到dx
150     targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights   # 通過公式2后四個的y*，ya，ha得到dy
151     targets_dw = np.log(gt_widths / ex_widths)        # 通過公式2后四個的w*，wa得到dw
152     targets_dh = np.log(gt_heights / ex_heights)      # 通過公式2后四個的h*，ha得到dh
153 
154     targets = np.vstack((targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
155     return targets

View Code

1.8 bbox_overlaps

bbox_overlaps用於計算archors和ground truth box重疊區域的面積。具體可見參考網址https://www.cnblogs.com/darkknightzh/p/9043395.html，程序中的代碼如下：

 1 def bbox_overlaps(
 2         np.ndarray[DTYPE_t, ndim=2] boxes,
 3         np.ndarray[DTYPE_t, ndim=2] query_boxes):
 4     """
 5     Parameters
 6     ----------
 7     boxes: (N, 4) ndarray of float
 8     query_boxes: (K, 4) ndarray of float
 9     Returns
10     -------
11     overlaps: (N, K) ndarray of overlap between boxes and query_boxes
12     """
13     cdef unsigned int N = boxes.shape[0]
14     cdef unsigned int K = query_boxes.shape[0]
15     cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
16     cdef DTYPE_t iw, ih, box_area
17     cdef DTYPE_t ua
18     cdef unsigned int k, n
19     for k in range(K):
20         box_area = (
21             (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
22             (query_boxes[k, 3] - query_boxes[k, 1] + 1)
23         )
24         for n in range(N):
25             iw = (
26                 min(boxes[n, 2], query_boxes[k, 2]) -
27                 max(boxes[n, 0], query_boxes[k, 0]) + 1
28             )
29             if iw > 0:
30                 ih = (
31                     min(boxes[n, 3], query_boxes[k, 3]) -
32                     max(boxes[n, 1], query_boxes[k, 1]) + 1
33                 )
34                 if ih > 0:
35                     ua = float(
36                         (boxes[n, 2] - boxes[n, 0] + 1) *
37                         (boxes[n, 3] - boxes[n, 1] + 1) +
38                         box_area - iw * ih
39                     )
40                     overlaps[n, k] = iw * ih / ua
41     return overlaps

View Code

1.9 _proposal_target_layer

_proposal_target_layer調用proposal_target_layer，並進一步調用_sample_rois從之前_proposal_layer中選出的2000個archors篩選出256個archors。_sample_rois將正樣本數量固定為最大64（小於時補負樣本），並根據公式2對坐標歸一化，通過_get_bbox_regression_labels得到bbox_targets。用於rcnn的分類及回歸。該層只在訓練時使用；測試時，直接選擇了300個archors，不需要該層了。

=============================================================

190901更新：

說明：感謝@ pytf 的說明（見第19樓和20樓），此處注釋有誤，146行的注釋：

# rois：從post_nms_topN個archors中選擇256個archors（第一列的全0更新為每個archors對應的類別）

rois第一列解釋錯誤。由於每次只有一張圖像輸入，因而rois第一列全為0.此處並沒有更新rois第一列為每個archors對應的類別。

另一方面，第139行，是將bbox_target_data第一列更新為每個archors對應的類別。該行解釋不太清晰。

190901更新結束

=============================================================

_proposal_target_layer定義如下

  1 def _proposal_target_layer(self, rois, roi_scores, name):  # post_nms_topN個archors的位置及為1（正樣本）的概率
  2     # 只在訓練時使用該層，從post_nms_topN個archors中選擇256個archors
  3     with tf.variable_scope(name) as scope:
  4         # labels：正樣本和負樣本對應的真實的類別
  5         # rois：從post_nms_topN個archors中選擇256個archors（第一列的全0更新為每個archors對應的類別）
  6         # roi_scores：256個archors對應的為正樣本的概率
  7         # bbox_targets：256*(4*21)的矩陣，只有為正樣本時，對應類別的坐標才不為0，其他類別的坐標全為0
  8         # bbox_inside_weights：256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
  9         # bbox_outside_weights：256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
 10         rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights = tf.py_func(
 11             proposal_target_layer, [rois, roi_scores, self._gt_boxes, self._num_classes],
 12             [tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32], name="proposal_target")
 13 
 14         rois.set_shape([cfg.TRAIN.BATCH_SIZE, 5])
 15         roi_scores.set_shape([cfg.TRAIN.BATCH_SIZE])
 16         labels.set_shape([cfg.TRAIN.BATCH_SIZE, 1])
 17         bbox_targets.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
 18         bbox_inside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
 19         bbox_outside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
 20 
 21         self._proposal_targets['rois'] = rois
 22         self._proposal_targets['labels'] = tf.to_int32(labels, name="to_int32")
 23         self._proposal_targets['bbox_targets'] = bbox_targets
 24         self._proposal_targets['bbox_inside_weights'] = bbox_inside_weights
 25         self._proposal_targets['bbox_outside_weights'] = bbox_outside_weights
 26 
 27         self._score_summaries.update(self._proposal_targets)
 28 
 29         return rois, roi_scores
 30  
 31 def proposal_target_layer(rpn_rois, rpn_scores, gt_boxes, _num_classes):
 32     """Assign object detection proposals to ground-truth targets. Produces proposal classification labels and bounding-box regression targets."""
 33     # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN (i.e., rpn.proposal_layer.ProposalLayer), or any other source
 34     all_rois = rpn_rois  # rpn_rois為post_nms_topN*5的矩陣
 35     all_scores = rpn_scores  # rpn_scores為post_nms_topN的矩陣，代表對應的archors為正樣本的概率
 36 
 37     if cfg.TRAIN.USE_GT:    # Include ground-truth boxes in the set of candidate rois;  USE_GT=False，未使用這段代碼
 38         zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype)
 39         all_rois = np.vstack((all_rois, np.hstack((zeros, gt_boxes[:, :-1]))))
 40         all_scores = np.vstack((all_scores, zeros))   # not sure if it a wise appending, but anyway i am not using it
 41 
 42     num_images = 1  # 該程序只能一次處理一張圖片
 43     rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images  # 每張圖片中最終選擇的rois
 44     fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)   # 正樣本的個數：0.25*rois_per_image
 45 
 46     # Sample rois with classification labels and bounding box regression targets
 47     # labels：正樣本和負樣本對應的真實的類別
 48     # rois：從post_nms_topN個archors中選擇256個archors（第一列的全0更新為每個archors對應的類別）
 49     # roi_scores：256個archors對應的為正樣本的概率
 50     # bbox_targets：256*(4*21)的矩陣，只有為正樣本時，對應類別的坐標才不為0，其他類別的坐標全為0
 51     # bbox_inside_weights：256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
 52     labels, rois, roi_scores, bbox_targets, bbox_inside_weights = _sample_rois(all_rois, all_scores, gt_boxes, fg_rois_per_image, rois_per_image, _num_classes) # 選擇256個archors
 53 
 54     rois = rois.reshape(-1, 5)
 55     roi_scores = roi_scores.reshape(-1)
 56     labels = labels.reshape(-1, 1)
 57     bbox_targets = bbox_targets.reshape(-1, _num_classes * 4)
 58     bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes * 4)
 59     bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32) # 256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
 60 
 61     return rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
 62 
 63 
 64 def _get_bbox_regression_labels(bbox_target_data, num_classes):
 65     """Bounding-box regression targets (bbox_target_data) are stored in a compact form N x (class, tx, ty, tw, th)
 66     This function expands those targets into the 4-of-4*K representation used by the network (i.e. only one class has non-zero targets).
 67     Returns:
 68         bbox_target (ndarray): N x 4K blob of regression targets
 69         bbox_inside_weights (ndarray): N x 4K blob of loss weights
 70     """
 71     clss = bbox_target_data[:, 0]  # 第1列，為類別
 72     bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)   # 256*(4*21)的矩陣
 73     bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
 74     inds = np.where(clss > 0)[0]   # 正樣本的索引
 75     for ind in inds:
 76         cls = clss[ind]  # 正樣本的類別
 77         start = int(4 * cls)  # 每個正樣本的起始坐標
 78         end = start + 4       # 每個正樣本的終止坐標（由於坐標為4）
 79         bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]   # 對應的坐標偏移賦值給對應的類別
 80         bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS   # 對應的權重(1.0, 1.0, 1.0, 1.0)賦值給對應的類別
 81 
 82     # bbox_targets：256*(4*21)的矩陣，只有為正樣本時，對應類別的坐標才不為0，其他類別的坐標全為0
 83     # bbox_inside_weights：256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
 84     return bbox_targets, bbox_inside_weights
 85 
 86 
 87 def _compute_targets(ex_rois, gt_rois, labels):
 88     """Compute bounding-box regression targets for an image."""
 89     assert ex_rois.shape[0] == gt_rois.shape[0]
 90     assert ex_rois.shape[1] == 4
 91     assert gt_rois.shape[1] == 4
 92 
 93     targets = bbox_transform(ex_rois, gt_rois)  # 通過公式2后四個，結合256個archor和對應的正樣本的坐標計算坐標的偏移
 94     if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:  # Optionally normalize targets by a precomputed mean and stdev
 95         targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS)) / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))   # 坐標減去均值除以標准差，進行歸一化
 96     return np.hstack((labels[:, np.newaxis], targets)).astype(np.float32, copy=False)  # 之前的bbox第一列為全0，此處第一列為對應的類別
 97 
 98 
 99 def _sample_rois(all_rois, all_scores, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):  # all_rois第一列全0，后4列為坐標；gt_boxes前4列為坐標，最后一列為類別
100     """Generate a random sample of RoIs comprising foreground and background examples."""
101     # 計算archors和gt_boxes重疊區域面積的比值
102     overlaps = bbox_overlaps(np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float), np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float)) # overlaps: (rois x gt_boxes)
103     gt_assignment = overlaps.argmax(axis=1)  # 得到每個archors對應的gt_boxes的索引
104     max_overlaps = overlaps.max(axis=1)   # 得到每個archors對應的gt_boxes的重疊區域的值
105     labels = gt_boxes[gt_assignment, 4]   # 得到每個archors對應的gt_boxes的類別
106 
107     # 每個archors對應的gt_boxes的重疊區域的值大於閾值的作為正樣本，得到正樣本的索引
108     fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]  # Select foreground RoIs as those with >= FG_THRESH overlap
109     # Guard against the case when an image has fewer than fg_rois_per_image. Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
110     # 每個archors對應的gt_boxes的重疊區域的值在給定閾值內的作為負樣本，得到負樣本的索引
111     bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) & (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
112 
113     # Small modification to the original version where we ensure a fixed number of regions are sampled
114     # 最終選擇256個archors
115     if fg_inds.size > 0 and bg_inds.size > 0: # 正負樣本均存在，則選擇最多fg_rois_per_image個正樣本，不夠的話，補充負樣本
116         fg_rois_per_image = min(fg_rois_per_image, fg_inds.size)
117         fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_image), replace=False)
118         bg_rois_per_image = rois_per_image - fg_rois_per_image
119         to_replace = bg_inds.size < bg_rois_per_image
120         bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_image), replace=to_replace)
121     elif fg_inds.size > 0:  # 只有正樣本，選擇rois_per_image個正樣本
122         to_replace = fg_inds.size < rois_per_image
123         fg_inds = npr.choice(fg_inds, size=int(rois_per_image), replace=to_replace)
124         fg_rois_per_image = rois_per_image
125     elif bg_inds.size > 0: # 只有負樣本，選擇rois_per_image個負樣本
126         to_replace = bg_inds.size < rois_per_image
127         bg_inds = npr.choice(bg_inds, size=int(rois_per_image), replace=to_replace)
128         fg_rois_per_image = 0
129     else:
130         import pdb
131         pdb.set_trace()
132 
133     keep_inds = np.append(fg_inds, bg_inds)  # 正樣本和負樣本的索引
134     labels = labels[keep_inds]  # 正樣本和負樣本對應的真實的類別
135     labels[int(fg_rois_per_image):] = 0  # 負樣本對應的類別設置為0
136     rois = all_rois[keep_inds]    # 從post_nms_topN個archors中選擇256個archors
137     roi_scores = all_scores[keep_inds]  # 256個archors對應的為正樣本的概率
138 
139     # 通過256個archors的坐標和每個archors對應的gt_boxes的坐標及這些archors的真實類別得到坐標偏移（將rois第一列的全0更新為每個archors對應的類別）
140     bbox_target_data = _compute_targets(rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)
141     # bbox_targets：256*(4*21)的矩陣，只有為正樣本時，對應類別的坐標才不為0，其他類別的坐標全為0
142     # bbox_inside_weights：256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
143     bbox_targets, bbox_inside_weights = _get_bbox_regression_labels(bbox_target_data, num_classes)
144 
145     # labels：正樣本和負樣本對應的真實的類別
146     # rois：從post_nms_topN個archors中選擇256個archors（第一列的全0更新為每個archors對應的類別）
147     # roi_scores：256個archors對應的為正樣本的概率
148     # bbox_targets：256*(4*21)的矩陣，只有為正樣本時，對應類別的坐標才不為0，其他類別的坐標全為0
149     # bbox_inside_weights：256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
150     return labels, rois, roi_scores, bbox_targets, bbox_inside_weights

View Code

1.10 _crop_pool_layer

_crop_pool_layer用於將256個archors從特征圖中裁剪出來縮放到14*14，並進一步max pool到7*7的固定大小，得到特征，方便rcnn網絡分類及回歸坐標。

該函數先得到特征圖對應的原始圖像的寬高，而后將原始圖像對應的rois進行歸一化，並使用tf.image.crop_and_resize（該函數需要歸一化的坐標信息）縮放到[cfg.POOLING_SIZE * 2, cfg.POOLING_SIZE * 2]，最后通過slim.max_pool2d進行pooling，輸出大小依舊一樣（256*7*7*512）。

tf.slice(rois, [0, 0], [-1, 1])是對輸入進行切片。其中第二個參數為起始的坐標，第三個參數為切片的尺寸。注意，對於二維輸入，后兩個參數均為y，x的順序；對於三維輸入，后兩個均為z，y，x的順序。當第三個參數為-1時，代表取整個該維度。上面那句是將roi的從0,0開始第一列的數據（y為-1，代表所有行，x為1，代表第一列）

_crop_pool_layer定義如下：

 1 def _crop_pool_layer(self, bottom, rois, name):
 2     with tf.variable_scope(name) as scope:
 3         batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])   # 得到第一列，為類別
 4         bottom_shape = tf.shape(bottom)  # Get the normalized coordinates of bounding boxes
 5         height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
 6         width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
 7         x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width  # 由於crop_and_resize的bboxes范圍為0-1，得到歸一化的坐標
 8         y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
 9         x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
10         y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
11         bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))  # Won't be back-propagated to rois anyway, but to save time
12         pre_pool_size = cfg.POOLING_SIZE * 2
13 
14         # 根據bboxes裁剪出256個特征，並縮放到14*14（channels和bottom的channels一樣），batchsize為256
15         crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")
16 
17     return slim.max_pool2d(crops, [2, 2], padding='SAME') # amx pool后得到7*7的特征

View Code

1.11 _head_to_tail

_head_to_tail用於將上面得到的256個archors的特征增加兩個fc層（ReLU）和兩個dropout（train時有，test時無），降維到4096維，用於_region_classification的分類及回歸。

_head_to_tail位於vgg16.py中，定義如下

 1 def _head_to_tail(self, pool5, is_training, reuse=None):
 2     with tf.variable_scope(self._scope, self._scope, reuse=reuse):
 3         pool5_flat = slim.flatten(pool5, scope='flatten')
 4         fc6 = slim.fully_connected(pool5_flat, 4096, scope='fc6')
 5         if is_training:
 6             fc6 = slim.dropout(fc6, keep_prob=0.5, is_training=True, scope='dropout6')
 7         fc7 = slim.fully_connected(fc6, 4096, scope='fc7')
 8         if is_training:
 9             fc7 = slim.dropout(fc7, keep_prob=0.5, is_training=True, scope='dropout7')
10 
11     return fc7

View Code

1.12 _region_classification

fc7通過_region_classification進行分類及回歸。fc7先通過fc層（無ReLU）降維到21層（類別數，得到cls_score），得到概率cls_prob及預測值cls_pred（用於rcnn的分類）。另一方面fc7通過fc層（無ReLU），降維到21*4，得到bbox_pred（用於rcnn的回歸）。

_region_classification定義如下：

 1 def _region_classification(self, fc7, is_training, initializer, initializer_bbox):
 2     # 增加fc層，輸出為總共類別的個數，進行分類
 3     cls_score = slim.fully_connected(fc7, self._num_classes, weights_initializer=initializer, trainable=is_training, activation_fn=None, scope='cls_score')
 4     cls_prob = self._softmax_layer(cls_score, "cls_prob")  # 得到每一類別的概率
 5     cls_pred = tf.argmax(cls_score, axis=1, name="cls_pred")  # 得到預測的類別
 6     # 增加fc層，預測位置信息的偏移
 7     bbox_pred = slim.fully_connected(fc7, self._num_classes * 4, weights_initializer=initializer_bbox, trainable=is_training, activation_fn=None, scope='bbox_pred')
 8 
 9     self._predictions["cls_score"] = cls_score   # 用於rcnn分類的256個archors的特征
10     self._predictions["cls_pred"] = cls_pred
11     self._predictions["cls_prob"] = cls_prob
12     self._predictions["bbox_pred"] = bbox_pred
13 
14     return cls_prob, bbox_pred

View Code

通過以上步驟，完成了網絡的創建rois, cls_prob, bbox_pred = self._build_network(training)。

rois：256*5

cls_prob：256*21（類別數）

bbox_pred：256*84（類別數*4）

2. 損失函數_add_losses

faster rcnn包括兩個損失：rpn網絡的損失+rcnn網絡的損失。其中每個損失又包括分類損失和回歸損失。分類損失使用的是交叉熵，回歸損失使用的是smooth L1 loss。

程序通過_add_losses增加對應的損失函數。其中rpn_cross_entropy和rpn_loss_box是RPN網絡的兩個損失，cls_score和bbox_pred是rcnn網絡的兩個損失。前兩個損失用於判斷archor是否是ground truth（二分類）；后兩個損失的batchsize是256。

將rpn_label(1,?,?,2)中不是-1的index取出來，之后將rpn_cls_score(1,?,?,2)及rpn_label中對應於index的取出，計算sparse_softmax_cross_entropy_with_logits，得到rpn_cross_entropy。

計算rpn_bbox_pred(1,?,?,36)和rpn_bbox_targets(1,?,?,36)的_smooth_l1_loss，得到rpn_loss_box。

計算cls_score（256*21）和label（256）的sparse_softmax_cross_entropy_with_logits：cross_entropy。

計算bbox_pred（256*84）和bbox_targets（256*84）的_smooth_l1_loss：loss_box。

最終將上面四個loss相加，得到總的loss（還需要加上regularization_loss）。

至此，損失構造完畢。

程序中通過_add_losses增加損失：

 1 def _add_losses(self, sigma_rpn=3.0):
 2     with tf.variable_scope('LOSS_' + self._tag) as scope:
 3         rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2])  # 每個archors是正樣本還是負樣本
 4         rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1])  # 特征圖中每個位置對應的是正樣本、負樣本還是不關注（去除了邊界在圖像外面的archors）
 5         rpn_select = tf.where(tf.not_equal(rpn_label, -1))    # 不關注的archor到的索引
 6         rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2])    # 去除不關注的archor
 7         rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1])        # 去除不關注的label
 8         rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))  # rpn二分類的損失
 9 
10         rpn_bbox_pred = self._predictions['rpn_bbox_pred']  #  每個位置的9個archors回歸位置偏移
11         rpn_bbox_targets = self._anchor_targets['rpn_bbox_targets']   # 特征圖中每個位置和對應的正樣本的坐標偏移（很多為0）
12         rpn_bbox_inside_weights = self._anchor_targets['rpn_bbox_inside_weights']  # 正樣本的權重為1（去除負樣本和不關注的樣本，均為0）
13         rpn_bbox_outside_weights = self._anchor_targets['rpn_bbox_outside_weights']   #   正樣本和負樣本（不包括不關注的樣本）歸一化的權重
14         rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])
15 
16         cls_score = self._predictions["cls_score"]  # 用於rcnn分類的256個archors的特征
17         label = tf.reshape(self._proposal_targets["labels"], [-1])   # 正樣本和負樣本對應的真實的類別
18         cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=cls_score, labels=label))   # rcnn分類的損失
19 
20         bbox_pred = self._predictions['bbox_pred']   # RCNN, bbox loss
21         bbox_targets = self._proposal_targets['bbox_targets']    # 256*(4*21)的矩陣，只有為正樣本時，對應類別的坐標才不為0，其他類別的坐標全為0
22         bbox_inside_weights = self._proposal_targets['bbox_inside_weights']  # 256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
23         bbox_outside_weights = self._proposal_targets['bbox_outside_weights']   # 256*(4*21)的矩陣，正樣本時，對應類別四個坐標的權重為1，其他全為0
24         loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)
25 
26         self._losses['cross_entropy'] = cross_entropy
27         self._losses['loss_box'] = loss_box
28         self._losses['rpn_cross_entropy'] = rpn_cross_entropy
29         self._losses['rpn_loss_box'] = rpn_loss_box
30 
31         loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box  # 總共的損失
32         regularization_loss = tf.add_n(tf.losses.get_regularization_losses(), 'regu')
33         self._losses['total_loss'] = loss + regularization_loss
34 
35         self._event_summaries.update(self._losses)
36 
37     return loss

View Code

smooth L1 loss定義如下（見fast rcnn論文）：

${{L}_{loc}}({{t}^{u}},v)=\sum\limits_{i\in \{x,y,w,h\}}{smoot{{h}_{{{L}_{1}}}}(t_{i}^{u}-{{v}_{i}})}\text{ (2)}$

in which

程序中先計算pred和target的差box_diff，而后得到正樣本的差in_box_diff（通過乘以權重bbox_inside_weights將負樣本設置為0）及絕對值abs_in_box_diff，之后計算上式(3)中的符號smoothL1_sign，並得到的smooth L1 loss：in_loss_box，乘以bbox_outside_weights權重，並得到最終的loss：loss_box。

其中_smooth_l1_loss定義如下：

 1 def _smooth_l1_loss(self, bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, sigma=1.0, dim=[1]):
 2     sigma_2 = sigma ** 2
 3     box_diff = bbox_pred - bbox_targets   # 預測的和真實的相減
 4     in_box_diff = bbox_inside_weights * box_diff  # 乘以正樣本的權重1（rpn：去除負樣本和不關注的樣本，rcnn：去除負樣本）
 5     abs_in_box_diff = tf.abs(in_box_diff)  # 絕對值
 6     smoothL1_sign = tf.stop_gradient(tf.to_float(tf.less(abs_in_box_diff, 1. / sigma_2)))   # 小於閾值的截斷的標志位
 7     in_loss_box = tf.pow(in_box_diff, 2) * (sigma_2 / 2.) * smoothL1_sign + (abs_in_box_diff - (0.5 / sigma_2)) * (1. - smoothL1_sign)   # smooth l1 loss
 8     out_loss_box = bbox_outside_weights * in_loss_box   # rpn：除以有效樣本總數（不考慮不關注的樣本），進行歸一化；rcnn：正樣本四個坐標權重為1，負樣本為0
 9     loss_box = tf.reduce_mean(tf.reduce_sum(out_loss_box, axis=dim))
10     return loss_box

View Code

3. 測試階段：

測試時，預測得到的bbox_pred需要乘以(0.1, 0.1, 0.2, 0.2)，（而后在加上(0.0, 0.0, 0.0, 0.0)）。create_architecture中

1 if testing:
2     stds = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (self._num_classes))
3     means = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (self._num_classes))
4     self._predictions["bbox_pred"] *= stds   # 訓練時_region_proposal中預測的位置偏移減均值除標准差，因而測試時需要反過來。
5     self._predictions["bbox_pred"] += means

具體可參見demo.py中的函數demo（調用test.py中的im_detect）。直接在python中調用該函數時，不需要先乘后加，模型freeze后，得到self._predictions["bbox_pred"]時，結果不對，調試后發現，先乘后加之后結果一致。

_im_info

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Faster rcnn代碼理解（3） Faster rcnn代碼理解（4）如何運行Faster RCNN的tensorflow代碼 A Simple Faster-RCNN 代碼理解學習 tensorflow faster rcnn 代碼分析一 demo.py Faster RCNN中RPN理解 Faster-RCNN tensorflow 程序細節 Faster-RCNN tensorflow源碼閱讀筆記 Faster RCNN算法訓練代碼解析（1） Faster RCNN算法demo代碼解析