Faster RCNN代碼解析

本文轉載自查看原文 2018-10-31 17:22 2718 深度學習/ 目標檢測

1.faster_rcnn_end2end訓練

1.1訓練入口及配置

def train():
    cfg.GPU_ID = 0
    cfg_file = "../experiments/cfgs/faster_rcnn_end2end.yml"
    cfg_from_file(cfg_file)
    if not False:
        # fix the random seeds (numpy and caffe) for reproducibility
        np.random.seed(cfg.RNG_SEED)
        caffe.set_random_seed(cfg.RNG_SEED)

    # set up caffe
    caffe.set_mode_gpu()
    caffe.set_device(0)

    imdb_name = "voc_2007_trainval"
    imdb, roidb = combined_roidb(imdb_name)
    output_dir = get_output_dir(imdb)
    max_iters = 10000
    pretrained_model = "../data/imagenet_models/ZF.v2.caffemodel"
    solver = "../models/pascal_voc/ZF/faster_rcnn_end2end/solver.prototxt"
    train_net(solver, roidb, output_dir,
              pretrained_model=pretrained_model,
              max_iters=max_iters)

if __name__ == '__main__':
    train()

1.2 數據准備

從train_net.py:combined_roidb(imdb_name)處開始，得到的是gt數據集。

輸入：“voc_2007_trainval”, 輸出：imdb , roidb。 imdb是datasets.pascal_voc.pascal_voc 類，訓練圖像總數為5011. roidb: 長度為10022,

1.3 訓練

跳轉到lib/fast_rcnn/train.py:156行。

roidb = filter_roidb(roidb)

由於這里使用的是gt數據集，所以沒有過濾任何數據。

sw = SolverWrapper(solver_prototxt, roidb, output_dir,
                       pretrained_model=pretrained_model)

訓練中使用了BBOX_REG和BBOX_NORMALIZE_TARGETS，這里需要預計算BBOX的均值和方差。

40行計算bounding-box回歸目標：

self.bbox_means, self.bbox_stds = \
                    rdl_roidb.add_bbox_regression_targets(roidb)

跳轉到roidb.py的46行，這里對roidb中的每個元素計算‘bbox_targets’. 輸入：rois: roidb[im_i]['boxes'], max_overlaps: [1,1,...], max_classes:[9,7,...]

roidb[im_i]['bbox_targets'] = \
                _compute_targets(rois, max_overlaps, max_classes)

_compute_targets()函數返回這個圖像中每個推薦框的標簽和回歸目標(t_i，見上篇文章http://www.cnblogs.com/benbencoding798/archive/2018/10/26/9856617.html)。不過由於輸入的只有真值框，所以得到的回歸目標值只有0。接下來是歸一化回歸目標值。由於gt的值都是0,所以回歸目標值也是0。

現在到了lib/fast_rcnn/train.py:44行，建立網絡結構。

self.solver = caffe.SGDSolver(solver_prototxt)

models/pascal_voc/ZF/faster_rcnn_end2end/train.prototxt中有4個python層，下面逐一進行調試。

1.4 python層建立

1.4.1 input-data層

數據輸入層

layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info' (w, h, im_scales)
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 21"
  }
}

setup()函數執行：
top[0].reshape(1, 3, 600, 1000)
self._name_to_top_map['data'] = 0
top[1].reshape(1, 3)
self._name_to_top_map['im_info'] = 1
top[1].reshape(1, 4)
self._name_to_top_map['gt_boxes'] = 2

在調用lib/roi_data_layer/layer.py中的setup()函數之后，調用reshape(self, bottom, top)函數。

接着調用rpn/__init__.py。

1.4.2 rpn-data層

Assign anchors to ground-truth targets. Produces anchor classification
labels and bounding-box regression targets.

layer {
  name: 'rpn-data'
  type: 'Python'
  bottom: 'rpn_cls_score'
  bottom: 'gt_boxes'
  bottom: 'im_info'
  bottom: 'data'
  top: 'rpn_labels'
  top: 'rpn_bbox_targets'
  top: 'rpn_bbox_inside_weights'
  top: 'rpn_bbox_outside_weights'
  python_param {
    module: 'rpn.anchor_target_layer'
    layer: 'AnchorTargetLayer'
    param_str: "'feat_stride': 16"
  }
}

shape: bottom["rpn_cls_score"]: (img_number, 18, height, width) bottom["gt_boxes"]: (gt_boxes_number, 4), bottom["im_info"]: (img_number, 3), bottom["data"]: (img_number, 3, height, width).

接下來運行rpn/anchor_target_layer.py: setup()。首先看self._anchors

self._anchors = generate_anchors(scales=np.array(anchor_scales))

跳轉到rpn/generate_anchors.py:generate_anchors()。這里在rpn/output層特征圖的每個位置會生成9個anchors。依我的理解，如果知道一張圖片的大小，由於在基本層的卷積之后，圖像大小會縮小到原來的1/16,所以每個anchor的位置都能在事先計算出來，並不需要放到rpn.anchor_target_layer計算。下面來看anchors是如何計算的。文中所說，基准的三個anchor size是[(16*8)*(16*8), (16*16)*(16*16), (16*32)*(16*32)]。在代碼中，16是基本大小，[8, 16, 32]是正方形框的伸縮尺度。也就是計算出基本大小上的anchor,乘以伸縮尺度，來得到在原圖上的推薦框。

rpn/generate_anchors.py這個文件是用來計算anchor的。

ratio_anchors = _ratio_enum(base_anchor, ratios)

base_anchor是[0,0,15,15],這個代碼算出了base_size=16時，三種長寬比例的anchors。計算結果是：[[-3.5, 2, 18.5, 13], [0, 0, 15, 15], [2.5, -3, 12.5, 18]], 直接計算它們的面積，第個和第三個都不會256,這看似計算錯誤。然而這一個點，看似沒有長度，實際上卻代表了16個像素。

def _ratio_enum(anchor, ratios):
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    size = w * h  # 256
    size_ratios = size / ratios   # [512, 256, 128] 
    ws = np.round(np.sqrt(size_ratios))  # [sqrt(512), sqrt(256), sqrt(128)]
    hs = np.round(ws * ratios)  # [sqrt(512)/2, sqrt(256), 2*sqrt(128)]
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

保留根號看，這樣是對的，寬和高的乘積都是256, 並且寬高比滿足文中條件。

最終9個anchors的結果是：這里需要注意的是中心點的坐標不應該乘以尺度。因為假設計算rpn/output特征圖上(0,0)點處的anchors，那么這一點其實代表原圖的{0,0,15,15]這一塊矩形區域，而中心點正是(7.5,7.5)。9個anchors最終要得到的是在原圖的坐標位置，所以中心點是不變的，只是寬度和高度會隨尺度縮放。

然后設置輸出的shape。以某一圖片舉例，該圖片在rpn/output層size為（39,64）（height, width)。

        # labels （1,1, 9×39, 64）
        top[0].reshape(1, 1, A * height, width)
        # bbox_targets （1,9×4, 39, 64)
        top[1].reshape(1, A * 4, height, width)
        # bbox_inside_weights (1,9*4, 39,64)
        top[2].reshape(1, A * 4, height, width)
        # bbox_outside_weights (1,9*4, 39,64)
        top[3].reshape(1, A * 4, height, width)

1.4.3 proposal 層

Outputs object detection proposals by applying estimated bounding-box
transformations to a set of regular boxes (called "anchors").

layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rpn_rois'
#  top: 'rpn_scores'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}

ProposalLayer層用於對anchors進行回歸矯正得到輸出的目標檢測框。這層與AnchorTargetLayer相同，也會計算

self._anchors = generate_anchors(scales=np.array(anchor_scales))

# rois blob: holds R regions of interest, each is a 5-tuple
        # (n, x1, y1, x2, y2) specifying an image batch index n and a
        # rectangle (x1, y1, x2, y2)
        top[0].reshape(1, 5)

1.4.4 roi-data 層

Assign object detection proposals to ground-truth targets. Produces proposal
classification labels and bounding-box regression targets.

layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'rpn.proposal_target_layer'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 21"
  }
}

shape

# sampled rois (0, x1, y1, x2, y2)
        top[0].reshape(1, 5)
        # labels
        top[1].reshape(1, 1)
        # bbox_targets
        top[2].reshape(1, self._num_classes * 4)
        # bbox_inside_weights
        top[3].reshape(1, self._num_classes * 4)
        # bbox_outside_weights
        top[4].reshape(1, self._num_classes * 4)

1.5 前饋計算

1.5.1 lib/roi_data_layer/layer.py

首先是得到當前批次的處理數據。當前設置的__C.TRAIN.IMS_PER_BAT CH=1,

blobs = self._get_next_minibatch()

得到的minibatch_db如下圖所示：

得到的blobs如下圖所示：

這一層的輸入是roidb,輸出是‘data’, 'gt_boxes', 'im_info'。

1.5.2 lib/rpn/anchor_target_layer.py

輸入：'rpn_cls_score'（bottom_0） , 'gt_boxes', 'im_info', 'data'(bottom_3), 如圖所示

total_anchors 得到卷積之后的特征圖的每個anchor在原圖中的位置。anchors保留所有不超過原圖邊界的anchors。下面的anchors都是指這個過濾了與邊界相交的。

程序中會計算保留下來的anchors與真值框相關的內容。1.計算每個anchor與每個真值框的IoU, 2.得到每個anchor的最大真值框， 3.得到和每個真值框具有最大IoU的anchor序號（同一個真值框，可能有多個最大值）。4.基於閾值，賦予labels正負標簽，當一個anchor和每個真值框的最大IoU小於cfg.TRAIN.RPN_NEGATIVE_OVERLAP=0.3時，賦予負標簽，對與任一真值框具有最大IoU或大於0.7閾值的anchor賦予正標簽。5. 對正負樣本采樣，正樣本數最多為128，多余的隨機從中選取，負樣本用256-正樣本數，剩余樣本設置為dondon't care。6.計算回歸目標。

bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])

gt_boxes[argmax_overlaps, :]的數組長度和anchors相同，內容是與anchor面積IoU對應最大的真值框。計算t*（http://www.cnblogs.com/benbencoding798/archive/2018/10/26/9856617.html）

# ex_rois是anchors, gt_rois是與anchors中的每個元素對應的真值框
def bbox_transform(ex_rois, gt_rois):
    ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
    ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
    ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
    ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights

    gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
    gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
    gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
    gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights

    targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
    targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
    targets_dw = np.log(gt_widths / ex_widths)
    targets_dh = np.log(gt_heights / ex_heights)

    targets = np.vstack(
        (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
    return targets

得到的bbox_targets如下圖。

7.設置權重。bbox_inside_weights，bbox_outside_weights shape為：((len(anchors), 4)

__C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
#在對應為正樣本的anchors列，內部權重給全1

bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)

# num_examples為256，就是每張圖片的訓練樣本數
positive_weights = np.ones((1, 4)) * 1.0 / num_examples
negative_weights = np.ones((1, 4)) * 1.0 / num_examples
# 外部權重，采用平均賦值

bbox_outside_weights[labels == 1, :] = positive_weights
bbox_outside_weights[labels == 0, :] = negative_weights

8. 當前集合到原始集合的映射(即從anchors到total_anchors)。labels，bbox_targets, bbox_inside_weights, bbox_outside_weights都做映射。9.對blob[top]的reshape。通過這一層，可以從anchors和真值框得到每個anchor對應的回歸目標值，也就是論文中的t*。並賦值了兩個權重矩陣，進而可以在prototxt中計算邊框回歸的smooth_l1損失值。

1.5.3 lib/rpn/proposal_layer.py

這層的目的就是利用在rpn網絡中預測得到的anchors為目標的概率值和回歸目標值，計算得到最終的推薦框。這里相當於知道了x_a, t_x,要計算x（見文章https://www.cnblogs.com/benbencoding798/p/9856617.html）

# Algorithm:
        #
        # for each (H, W) location i
        #   generate A anchor boxes centered on cell i
        #   apply predicted bbox deltas at cell i to each of the A anchors
        # clip predicted boxes to image
        # remove predicted boxes with either height or width < threshold
        # sort all (proposal, score) pairs by score from highest to lowest
        # take top pre_nms_topN proposals before NMS
        # apply NMS with threshold 0.7 to remaining proposals
        # take after_nms_topN proposals after NMS
        # return the top proposals (-> RoIs top, scores top)

1.首先得到特征圖上每個anchor分類為前景的scores. 2.和上節相同，計算anchors的坐標 3.形狀格式化，anchors:(w*h*9,4), bbox_deltas:(w*h*9,4), scores:(w*h*9,1)。3.使用anchors和神經網絡回歸得到的bbox_deltas計算預測框。4.將越過原圖像邊框的proposal的相應坐標設置為邊界坐標。 5.去除寬度或高度過小的proposal 6. 按proposals預測為前景的分數排序，取前pre_nms_topN個元素。7.應用nms(非極大化抑制)，取前post_nms_topN個元素。8 輸出到top的rpn_rois的每個推薦框的第一列元素是每批訓練圖像的序號索引，由於代碼中只實現了單張圖片訓練實現，所以這列的值都是0，其余四列的值是推薦框的坐標。

輸出的rpn_rois shape: (2000, 5)

1.5.4 rpn/proposal_target_layer.py

這里相當於fast rcnn中的數據輸入層，因為前面已經得到推薦框了。現在是計算推薦框和真值框的IoU得到具體的目標標簽值（21類），並且計算預測框和真值框的偏移量。

# Sample rois with classification labels and bounding box regressiontargets
        labels, rois, bbox_targets, bbox_inside_weights = _sample_rois(
            all_rois, gt_boxes, fg_rois_per_image,
            rois_per_image, self._num_classes)

這是前饋傳播的主要計算內容。all_rois是rois和gt的合集，fg_rois_per_image = 128* 1/4, rois_per_image = 128.

1.首先計算all_rois與gt的IoU。此時all_rois shape: (2002,5), gt_boxes: (2,5), 得到的overlaps shape : (2002,2)

# overlaps: (rois x gt_boxes)
    overlaps = bbox_overlaps(
        np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
        np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))

2.根據overlaps得到all_rois的labels。

labels = gt_boxes[gt_assignment, 4]

3. 根據閾值，選擇前景和背景rois和labels.

4.計算rois與對應的gt的偏移量。

5.將targets映射到84類中，只有前景的targets才進行映射，背景的都是0.對應類的權重是1. bbox_targets shape: (128, 84)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Faster RCNN算法訓練代碼解析（2） Faster RCNN算法訓練代碼解析（1） Faster RCNN算法demo代碼解析 Faster RCNN論文解析 Faster rcnn代碼理解（2） Faster rcnn代碼理解（1） Faster rcnn代碼理解（3） Faster rcnn代碼理解（4） Faster RCNN（tensorflow）代碼詳解 Faster RCNN代碼理解（Python）