毫無疑問,faster-rcnn是目標檢測領域的一個里程碑式的算法。本文主要是本人閱讀python版本的faster-rcnn代碼的一個記錄,算法的具體原理本文也會有介紹,但是為了對該算法有一個整體性的理解以及更好地理解本文,還需事先閱讀faster-rcnn的論文並參考網上的一些說明性的博客(如一文讀懂Faster RCNN)。官方的py-faster-rcnn代碼庫已經不再維護了,我使用的是經過少許修改后的代碼(主要是numpy版本不兼容導致的一些錯誤),可以參考這里。
faster-rcnn有2種訓練方式,一是兩階段法,二是端到端的方法,本文主要講述端到端的方法,並以訓練代碼的運行順序進行閱讀。
一、數據准備
程序首先從faster-rcnn/tools/train_net.py運行,程序如下:
84 if __name__ == '__main__': 85 args = parse_args() 86 87 print('Called with args:') 88 print(args) 89 90 if args.cfg_file is not None: 91 cfg_from_file(args.cfg_file) 92 if args.set_cfgs is not None: 93 cfg_from_list(args.set_cfgs) 94 95 cfg.GPU_ID = args.gpu_id 96 97 print('Using config:') 98 pprint.pprint(cfg) 99 100 if not args.randomize: 101 # fix the random seeds (numpy and caffe) for reproducibility 102 np.random.seed(cfg.RNG_SEED) 103 caffe.set_random_seed(cfg.RNG_SEED) 104 105 # set up caffe 106 caffe.set_mode_gpu() 107 caffe.set_device(args.gpu_id) 108 109 imdb, roidb = combined_roidb(args.imdb_name) 110 print '{:d} roidb entries'.format(len(roidb)) 111 112 output_dir = get_output_dir(imdb) 113 print 'Output will be saved to `{:s}`'.format(output_dir) 114 115 train_net(args.solver, roidb, output_dir, 116 pretrained_model=args.pretrained_model, 117 max_iters=args.max_iters)
該部分cfg_from_file(args.cfg_file)調用faster-rcnn/lib/fast_rcnn/config.py中的cfg_from_file方法,從faster-rcnn/experiments/cfgs/faster_rcnn_end2end.yml文件中加載一些端到端訓練時用到的參數配置,這句話會修改config.py中一些參數的值,下面是faster_rcnn_end2end.yml中的內容:
1 EXP_DIR: faster_rcnn_end2end 2 TRAIN: 3 HAS_RPN: True 4 IMS_PER_BATCH: 1 5 BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True 6 RPN_POSITIVE_OVERLAP: 0.7 7 RPN_BATCHSIZE: 256 8 PROPOSAL_METHOD: gt 9 BG_THRESH_LO: 0.0 10 TEST: 11 HAS_RPN: True
train_net第109行imdb, roidb = combined_roidb(args.imdb_name)是數據准備的核心部分。返回的imdb是類pascal_voc的一個實例,后面只用到了其的一些路徑,作用不大。roidb則包含了訓練網絡所需要的所有信息。下面看一下它的產生過程:
1 def combined_roidb(imdb_names): 2 def get_roidb(imdb_name): 3 imdb = get_imdb(imdb_name) 4 print 'Loaded dataset `{:s}` for training'.format(imdb.name) 5 imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD) 6 print 'Set proposal method: {:s}'.format(cfg.TRAIN.PROPOSAL_METHOD) 7 roidb = get_training_roidb(imdb) 8 return roidb 9 10 roidbs = [get_roidb(s) for s in imdb_names.split('+')] 11 roidb = roidbs[0] 12 if len(roidbs) > 1: 13 for r in roidbs[1:]: 14 roidb.extend(r) 15 imdb = datasets.imdb.imdb(imdb_names) 16 else: 17 imdb = get_imdb(imdb_names) 18 return imdb, roidb
下面逐一分析combined_roidb函數中的每步操作。
1.1、get_imdb
首先由imdb = get_imdb(imdb_name)調用faster-rcnn/lib/datasets/factory.py中的get_imdb方法,返回了一個faster-rcnn/lib/datasets/pascal_voc.py中的pascal_voc類的實例。我輸入函數get_imdb的參數是'voc_2007_trainval',與其對應的初始化pascal_voc類的參數為image_set='trainval',year='2007'。在這個pascal_voc實例中,數據集的路徑由以下方式獲取:
1 def __init__(self, image_set, year, devkit_path=None): 2 imdb.__init__(self, 'voc_' + year + '_' + image_set) 3 self._year = year 4 self._image_set = image_set 5 self._devkit_path = self._get_default_path() if devkit_path is None \ 6 else devkit_path 7 self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year) 8 9 def _get_default_path(self): 10 """ 11 Return the default path where PASCAL VOC is expected to be installed. 12 """ 13 return os.path.join(cfg.DATA_DIR, 'VOCdevkit' + self._year)
至於cfg.DATA_DIR,由faster-rcnn/lib/fast_rcnn/config.py文件的如下內容確定:
1 # Root directory of project 2 __C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..')) 3 4 # Data directory 5 __C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data'))
因此,由給出的以上參數確定的數據集的路徑為self._data_path=$CODE_DIR/faster-rcnn/data/VOCdevkit2007/VOC2007。
1.2、imdb.set_proposal_method
其次,imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD)會調用faster-rcnn/lib/datasets/imdb.py中類imdb中的set_proposal_method方法(因為pascal_voc繼承自imdb),進而使self.roidb_handler為類pascal_voc中的gt_roidb方法(因為參數method='gt')。
這步操作非常重要,因為函數gt_roidb就是讀取pascal_voc數據集,並返回所有圖片信息的函數,代碼如下:
1 def gt_roidb(self): 2 """ 3 Return the database of ground-truth regions of interest. 4 5 This function loads/saves from/to a cache file to speed up future calls. 6 """ 7 cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl') 8 if os.path.exists(cache_file): 9 with open(cache_file, 'rb') as fid: 10 roidb = cPickle.load(fid) 11 print '{} gt roidb loaded from {}'.format(self.name, cache_file) 12 return roidb 13 14 gt_roidb = [self._load_pascal_annotation(index) 15 for index in self.image_index] 16 with open(cache_file, 'wb') as fid: 17 cPickle.dump(gt_roidb, fid, cPickle.HIGHEST_PROTOCOL) 18 print 'wrote gt roidb to {}'.format(cache_file) 19 20 return gt_roidb
在函數gt_roidb中,首先判斷有沒有cache_file(它會在第一次讀取數據集標注文件之后將所有字典形式的標注信息寫進一個文件中,我創建數據類用的imdb_name='voc_2007_trainval',因此對應的文件名為faster-rcnn/data/voc_2007_trainval_gt_roidb.pkl),若存在,則直接從中讀取標注信息,若不存在,則通過調用_load_pascal_annotation將pascal_voc數據集中每張圖片的標注信息讀取讀取到一個字典中,具體代碼如下:
1 def _load_pascal_annotation(self, index): 2 """ 3 Load image and bounding boxes info from XML file in the PASCAL VOC 4 format. 5 """ 6 filename = os.path.join(self._data_path, 'Annotations', index + '.xml') 7 tree = ET.parse(filename) 8 objs = tree.findall('object') 9 if not self.config['use_diff']: 10 # Exclude the samples labeled as difficult 11 non_diff_objs = [ 12 obj for obj in objs if int(obj.find('difficult').text) == 0] 13 # if len(non_diff_objs) != len(objs): 14 # print 'Removed {} difficult objects'.format( 15 # len(objs) - len(non_diff_objs)) 16 objs = non_diff_objs 17 num_objs = len(objs) 18 19 boxes = np.zeros((num_objs, 4), dtype=np.uint16) 20 gt_classes = np.zeros((num_objs), dtype=np.int32) 21 overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32) 22 # "Seg" area for pascal is just the box area 23 seg_areas = np.zeros((num_objs), dtype=np.float32) 24 25 # Load object bounding boxes into a data frame. 26 for ix, obj in enumerate(objs): 27 bbox = obj.find('bndbox') 28 # Make pixel indexes 0-based 29 x1 = float(bbox.find('xmin').text) - 1 30 y1 = float(bbox.find('ymin').text) - 1 31 x2 = float(bbox.find('xmax').text) - 1 32 y2 = float(bbox.find('ymax').text) - 1 33 cls = self._class_to_ind[obj.find('name').text.lower().strip()] 34 boxes[ix, :] = [x1, y1, x2, y2] 35 gt_classes[ix] = cls 36 overlaps[ix, cls] = 1.0 37 seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1) 38 39 overlaps = scipy.sparse.csr_matrix(overlaps) 40 41 return {'boxes' : boxes, 42 'gt_classes': gt_classes, 43 'gt_overlaps' : overlaps, 44 'flipped' : False, 45 'seg_areas' : seg_areas}
值得一提的是,字典中,overlaps指的是該張圖片中,每個物體與其它ground true之間的重疊比例,不過從代碼來看,默認一張圖片中所有的物體(ground true)之間是沒有重疊的,因而overlaps的shape為(num_objs, self.num_classes),它的每一行(第一個軸上)只有一個元素是1.0,其它的元素都是0。這種默認方式雖然與實際標注情況不符,但對后面的操作並沒有影響。
1.3、get_training_roidb
roidb = get_training_roidb(imdb)會調用faster-rcnn/lib/fast_rcnn/train.py中的get_training_roidb函數:
1 def get_training_roidb(imdb): 2 """Returns a roidb (Region of Interest database) for use in training.""" 3 if cfg.TRAIN.USE_FLIPPED: 4 print 'Appending horizontally-flipped training examples...' 5 imdb.append_flipped_images() 6 print 'done' 7 8 print 'Preparing training data...' 9 rdl_roidb.prepare_roidb(imdb) 10 print 'done' 11 12 return imdb.roidb
會進行2步操作。
1.3.1、imdb.append_flipped_images()
1 def append_flipped_images(self): 2 num_images = self.num_images 3 widths = self._get_widths() 4 for i in xrange(num_images): 5 boxes = self.roidb[i]['boxes'].copy() 6 oldx1 = boxes[:, 0].copy() 7 oldx2 = boxes[:, 2].copy() 8 boxes[:, 0] = widths[i] - oldx2 - 1 9 boxes[:, 2] = widths[i] - oldx1 - 1 10 assert (boxes[:, 2] >= boxes[:, 0]).all() 11 entry = {'boxes' : boxes, 12 'gt_overlaps' : self.roidb[i]['gt_overlaps'], 13 'gt_classes' : self.roidb[i]['gt_classes'], 14 'flipped' : True} 15 self.roidb.append(entry) 16 self._image_index = self._image_index * 2
此句調用faster-rcnn/lib/datasets/imdb.py中類imdb的append_flipped_images方法,其作用是將數據集中的每張圖的所有bounding box標簽進行水平翻轉,然后將圖片信息字典中的'flipped'置為True,並將這一新的字典添加進原始的roidb list中,這樣圖片信息列表的長度就變為了原來的2倍。最后將數據集實例中的_image_index成員(所有圖片名的list)復制了一份,長度也變為了原來的2倍。值得關注的是self.roidb是類imdb的一個屬性(由Python內置的@property裝飾器修飾)。屬性和方法的不同之處在於調用方法需要加(),如某方法名為methodname,調用方式為methodname(),而調用屬性不需要加(),self.roidb的構造過程如以下代碼所示。另外,裝飾器@methodname.setter可以把一個方法變成可以賦值的屬性,“=”右側的表達式作為傳入方法的實參,如以下代碼中的@roidb_handler.setter。
1 @property 2 def roidb_handler(self): 3 return self._roidb_handler 4 5 @roidb_handler.setter 6 def roidb_handler(self, val): 7 self._roidb_handler = val 8 9 def set_proposal_method(self, method): 10 method = eval('self.' + method + '_roidb') 11 self.roidb_handler = method 12 13 @property 14 def roidb(self): 15 # A roidb is a list of dictionaries, each with the following keys: 16 # boxes 17 # gt_overlaps 18 # gt_classes 19 # flipped 20 if self._roidb is not None: 21 return self._roidb 22 self._roidb = self.roidb_handler() 23 return self._roidb
1.3.2、rdl_roidb.prepare_roidb(imdb)
1 def prepare_roidb(imdb): 2 """Enrich the imdb's roidb by adding some derived quantities that 3 are useful for training. This function precomputes the maximum 4 overlap, taken over ground-truth boxes, between each ROI and 5 each ground-truth box. The class with maximum overlap is also 6 recorded. 7 """ 8 sizes = [PIL.Image.open(imdb.image_path_at(i)).size 9 for i in xrange(imdb.num_images)] 10 roidb = imdb.roidb 11 for i in xrange(len(imdb.image_index)): 12 roidb[i]['image'] = imdb.image_path_at(i) 13 roidb[i]['width'] = sizes[i][0] 14 roidb[i]['height'] = sizes[i][1] 15 # need gt_overlaps as a dense array for argmax 16 gt_overlaps = roidb[i]['gt_overlaps'].toarray() 17 # max overlap with gt over classes (columns) 18 max_overlaps = gt_overlaps.max(axis=1) 19 # gt class that had the max overlap 20 max_classes = gt_overlaps.argmax(axis=1) 21 roidb[i]['max_classes'] = max_classes 22 roidb[i]['max_overlaps'] = max_overlaps 23 # sanity checks 24 # max overlap of 0 => class should be zero (background) 25 zero_inds = np.where(max_overlaps == 0)[0] 26 assert all(max_classes[zero_inds] == 0) 27 # max overlap > 0 => class should not be zero (must be a fg class) 28 nonzero_inds = np.where(max_overlaps > 0)[0] 29 assert all(max_classes[nonzero_inds] != 0)
此句調用faster-rcnn/lib/roi_data_layer/roidb.py中的prepare_roidb函數,其作用是在圖片信息字典中加入5個鍵值。分別是'image'(圖片的全路徑),'width'(圖片的寬度),'height'(圖片的高度),'max_classes','max_overlaps'。
至此roidb的構造過程便結束了,下面總結一下:最終得到的roidb是一個包含數據集中所有圖片(以及它的水平翻轉)信息的list,每張圖的信息(保存在一個字典中)對應着list中的一個元素。每張圖片的信息結構如下:
1 { 2 'boxes' : boxes, # picture's bounding box: xmin, ymin, xmax, ymax(pixel indexes 0-based), 3 # shape: (num_objs, 4), dtype=np.uint16 4 'gt_classes': gt_classes, # gt class label(background is 0), shape: (num_objs,), dtype=np.int32 5 'gt_overlaps' : overlaps, # each obj's max overlap with one of gt, shape: (num_objs, self.num_classes), dtype=np.float32 6 'flipped' : False, 7 'seg_areas' : seg_areas, # area for each obj in one picture, shape: (num_objs,), dtype=np.float32 8 'image' : image_full_path, 9 'width' : image_width, 10 'height' : image_height, 11 'max_classes' : max_classes, # equal to gt_classes, shape: (num_objs,), dtype=np.int64 12 'max_overlaps' : max_overlaps, # all elements are 1.0, shape: (num_objs,), dtype=np.float32 13 }
1.4、get_output_dir
train_net.py第112行output_dir = get_output_dir(imdb)調用faster-rcnn/lib/fast_rcnn/config.py中的get_output_dir函數:
1 def get_output_dir(imdb, net=None): 2 """Return the directory where experimental artifacts are placed. 3 If the directory does not exist, it is created. 4 5 A canonical path is built using the name from an imdb and a network 6 (if not None). 7 """ 8 outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name)) 9 if net is not None: 10 outdir = osp.join(outdir, net.name) 11 if not os.path.exists(outdir): 12 os.makedirs(outdir) 13 return outdir
函數中的__C.EXP_DIR在faster_rcnn_end2end.yml中的配置為faster_rcnn_end2end,因此最終outdir=$CODE_DIR/faster-rcnn/output/faster_rcnn_end2end/voc_2007_trainval
1.5、train_net
使用以上得到的roidb,output_dir等作為參數,訓練網絡。調用faster-rcnn/lib/fast_rcnn/train.py中的train_net函數:
1 def train_net(solver_prototxt, roidb, output_dir, 2 pretrained_model=None, max_iters=40000): 3 """Train a Fast R-CNN network.""" 4 5 roidb = filter_roidb(roidb) 6 sw = SolverWrapper(solver_prototxt, roidb, output_dir, 7 pretrained_model=pretrained_model) 8 9 print 'Solving...' 10 model_paths = sw.train_model(max_iters) 11 print 'done solving' 12 return model_paths
1.5.1、filter_roidb
roidb = filter_roidb(roidb)調用filter_roidb函數對上述得到的roidb再按照一定的要求作進一步的過濾:
1 __C.TRAIN.FG_THRESH = 0.5 2 __C.TRAIN.BG_THRESH_HI = 0.5 3 __C.TRAIN.BG_THRESH_LO = 0.1 4 5 def filter_roidb(roidb): 6 """Remove roidb entries that have no usable RoIs.""" 7 8 def is_valid(entry): 9 # Valid images have: 10 # (1) At least one foreground RoI OR 11 # (2) At least one background RoI 12 overlaps = entry['max_overlaps'] 13 # find boxes with sufficient overlap 14 fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0] 15 # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) 16 bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) & 17 (overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] 18 # image is only valid if such boxes exist 19 valid = len(fg_inds) > 0 or len(bg_inds) > 0 20 return valid 21 22 num = len(roidb) 23 filtered_roidb = [entry for entry in roidb if is_valid(entry)] 24 num_after = len(filtered_roidb) 25 print 'Filtered {} roidb entries: {} -> {}'.format(num - num_after, 26 num, num_after) 27 return filtered_roidb
一般的標注信息都能滿足上述2個要求。
1.5.2、SolverWrapper
在該類的初始化函數中,主要有以下操作:
函數中各配置參數的值如下:
cfg.TRAIN.HAS_RPN=True
cfg.TRAIN.BBOX_REG=True
cfg.TRAIN.BBOX_NORMALIZE_TARGETS=True
cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED=True
1 def __init__(self, solver_prototxt, roidb, output_dir, 2 pretrained_model=None): 3 """Initialize the SolverWrapper.""" 4 self.output_dir = output_dir 5 6 if (cfg.TRAIN.HAS_RPN and cfg.TRAIN.BBOX_REG and 7 cfg.TRAIN.BBOX_NORMALIZE_TARGETS): 8 # RPN can only use precomputed normalization because there are no 9 # fixed statistics to compute a priori 10 assert cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED 11 12 if cfg.TRAIN.BBOX_REG: 13 print 'Computing bounding-box regression targets...' 14 self.bbox_means, self.bbox_stds = \ 15 rdl_roidb.add_bbox_regression_targets(roidb) 16 print 'done' 17 18 self.solver = caffe.SGDSolver(solver_prototxt) 19 if pretrained_model is not None: 20 print ('Loading pretrained model ' 21 'weights from {:s}').format(pretrained_model) 22 self.solver.net.copy_from(pretrained_model) 23 24 self.solver_param = caffe_pb2.SolverParameter() 25 with open(solver_prototxt, 'rt') as f: 26 pb2_text_format.Merge(f.read(), self.solver_param) 27 28 self.solver.net.layers[0].set_roidb(roidb)
1.5.2.1、add_bbox_regression_targets
函數中self.bbox_means, self.bbox_stds = rdl_roidb.add_bbox_regression_targets(roidb)調用faster-rcnn/lib/roi_data_layer/roidb.py中的add_bbox_regression_targets函數。
函數中各配置參數的值如下:
cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED=True
cfg.TRAIN.BBOX_NORMALIZE_TARGETS=True
1 def add_bbox_regression_targets(roidb): 2 """Add information needed to train bounding-box regressors.""" 3 assert len(roidb) > 0 4 assert 'max_classes' in roidb[0], 'Did you call prepare_roidb first?' 5 6 num_images = len(roidb) 7 # Infer number of classes from the number of columns in gt_overlaps 8 num_classes = roidb[0]['gt_overlaps'].shape[1] 9 for im_i in xrange(num_images): 10 rois = roidb[im_i]['boxes'] 11 max_overlaps = roidb[im_i]['max_overlaps'] 12 max_classes = roidb[im_i]['max_classes'] 13 roidb[im_i]['bbox_targets'] = \ 14 _compute_targets(rois, max_overlaps, max_classes) 15 16 if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: 17 # Use fixed / precomputed "means" and "stds" instead of empirical values 18 means = np.tile( 19 np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (num_classes, 1)) 20 stds = np.tile( 21 np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (num_classes, 1)) 22 else: 23 # Compute values needed for means and stds 24 # var(x) = E(x^2) - E(x)^2 25 class_counts = np.zeros((num_classes, 1)) + cfg.EPS 26 sums = np.zeros((num_classes, 4)) 27 squared_sums = np.zeros((num_classes, 4)) 28 for im_i in xrange(num_images): 29 targets = roidb[im_i]['bbox_targets'] 30 for cls in xrange(1, num_classes): 31 cls_inds = np.where(targets[:, 0] == cls)[0] 32 if cls_inds.size > 0: 33 class_counts[cls] += cls_inds.size 34 sums[cls, :] += targets[cls_inds, 1:].sum(axis=0) 35 squared_sums[cls, :] += \ 36 (targets[cls_inds, 1:] ** 2).sum(axis=0) 37 38 means = sums / class_counts 39 stds = np.sqrt(squared_sums / class_counts - means ** 2) 40 41 print 'bbox target means:' 42 print means 43 print means[1:, :].mean(axis=0) # ignore bg class 44 print 'bbox target stdevs:' 45 print stds 46 print stds[1:, :].mean(axis=0) # ignore bg class 47 48 # Normalize targets 49 if cfg.TRAIN.BBOX_NORMALIZE_TARGETS: 50 print "Normalizing targets" 51 for im_i in xrange(num_images): 52 targets = roidb[im_i]['bbox_targets'] 53 for cls in xrange(1, num_classes): 54 cls_inds = np.where(targets[:, 0] == cls)[0] 55 roidb[im_i]['bbox_targets'][cls_inds, 1:] -= means[cls, :] 56 roidb[im_i]['bbox_targets'][cls_inds, 1:] /= stds[cls, :] 57 else: 58 print "NOT normalizing targets" 59 60 # These values will be needed for making predictions 61 # (the predicts will need to be unnormalized and uncentered) 62 return means.ravel(), stds.ravel()
add_bbox_regression_targets首先計算所有邊界框的回歸目標(注意不是邊界框的坐標),然后使用事先設定的均值和方差將回歸目標標准化:
1 __C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0) 2 __C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)
因為從gt到gt的回歸目標都為0,因此標准化之后仍然為0,我認為這一步有點多余。其中使用到的函數有_compute_targets、bbox_transform。
1.5.2.2、set_roidb
在SolverWrapper的初始化函數中,接下來是構造一個caffe中的solver對象、加載與訓練模型的參數。最后使用self.solver.net.layers[0].set_roidb(roidb)將上述的roidb傳入網絡的第一層,即input-data層中。set_roidb的具體代碼如下:
函數中各配置參數的值如下:
cfg.TRAIN.USE_PREFETCH=False
cfg.TRAIN.ASPECT_GROUPING=True
1 def set_roidb(self, roidb): 2 """Set the roidb to be used by this layer during training.""" 3 self._roidb = roidb 4 self._shuffle_roidb_inds() 5 if cfg.TRAIN.USE_PREFETCH: 6 self._blob_queue = Queue(10) 7 self._prefetch_process = BlobFetcher(self._blob_queue, 8 self._roidb, 9 self._num_classes) 10 self._prefetch_process.start() 11 # Terminate the child process when the parent exists 12 def cleanup(): 13 print 'Terminating BlobFetcher' 14 self._prefetch_process.terminate() 15 self._prefetch_process.join() 16 import atexit 17 atexit.register(cleanup) 18 19 def _shuffle_roidb_inds(self): 20 """Randomly permute the training roidb.""" 21 if cfg.TRAIN.ASPECT_GROUPING: 22 widths = np.array([r['width'] for r in self._roidb]) 23 heights = np.array([r['height'] for r in self._roidb]) 24 horz = (widths >= heights) 25 vert = np.logical_not(horz) 26 horz_inds = np.where(horz)[0] 27 vert_inds = np.where(vert)[0] 28 inds = np.hstack(( 29 np.random.permutation(horz_inds), 30 np.random.permutation(vert_inds))) 31 inds = np.reshape(inds, (-1, 2)) 32 row_perm = np.random.permutation(np.arange(inds.shape[0])) 33 inds = np.reshape(inds[row_perm, :], (-1,)) 34 self._perm = inds 35 else: 36 self._perm = np.random.permutation(np.arange(len(self._roidb))) 37 self._cur = 0
其中,使用到的函數有set_roidb、_shuffle_roidb_inds。至此,faster-rcnn的數據准備階段完成。
