Faster RCNN PyTorch 代碼解讀


代碼來源:

https://github.com/jwyang/faster-rcnn.pytorch

 

一、EasyDict 和 yaml

作者是使用EasyDict進行導入參數的

# `pip install easydict` if you don't have it
from easydict import EasyDict as edict

__C = edict()
# Consumers can get config by:
#   from fast_rcnn_config import cfg
cfg = __C

  

EasyDict,簡單字典,就是將字典的索引方式由 myDict['key'] 的方式簡化成 myDict.key

from easydict import EasyDict as edict

myedict = edict()

cfg = myedict


myedict.a = 100

myedict.TRAIN = edict()
myedict.TRAIN.LEARNING_RATE = 0.001
myedict.TRAIN.MOMENTUM = 0.9


print(cfg.a)
print(cfg.TRAIN.LEARNING_RATE)

  

注意這里的cfg和mydict是賦值引用,即cfg和myedict為同一個對象,myedict改變,cfg則會產生同樣的改變。

 

如果字典本身的所有key都是string,則可以直接將其轉化成EasyDict,myEasyDict = EasyDict(myDict)

 

作者把預設的參數也用yaml格式以文件格式存儲了

EXP_DIR: vgg16
TRAIN:
  HAS_RPN: True
  BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
  RPN_POSITIVE_OVERLAP: 0.7
  RPN_BATCHSIZE: 256
  PROPOSAL_METHOD: gt
  BG_THRESH_LO: 0.0
  BATCH_SIZE: 256
  LEARNING_RATE: 0.01
TEST:
  HAS_RPN: True
POOLING_MODE: align
CROP_RESIZE_WITH_MAX_POOL: False

  

yaml文件格式要求如下:

  • 區分大小寫;
  • 使用縮進表示層級關系;
  • 使用空格鍵縮進,而非Tab鍵縮進
  • 縮進的空格數目不固定,只需要相同層級的元素左側對齊;
  • 文件中的字符串不需要使用引號標注,但若字符串包含有特殊字符則需用引號標注;
  • 注釋標識為#

 

將yaml格式的文件導入,可以這樣做:

import yaml
from easydict import EasyDict as edict


with open("vgg16.yml", 'r') as f:
    yaml_cfg1 = yaml.load(f, Loader=yaml.FullLoader)

with open("vgg16.yml", 'r') as f:
    yaml_cfg2 = edict(yaml.load(f, Loader=yaml.FullLoader))


print(yaml_cfg1) # 字典
print(yaml_cfg2) # EasyDict

print(yaml_cfg1["TRAIN"]["RPN_POSITIVE_OVERLAP"])
print(yaml_cfg2.TRAIN.RPN_POSITIVE_OVERLAP)

 

 

二、輸入的處理

初始化4個Tensor,im_data、im_info、num_boxes、gt_boxes

# initilize the tensor holder here.
  im_data = torch.FloatTensor(1)
  im_info = torch.FloatTensor(1)
  num_boxes = torch.LongTensor(1)
  gt_boxes = torch.FloatTensor(1)

  # ship to cuda
  if args.cuda > 0:
    im_data = im_data.cuda()
    im_info = im_info.cuda()
    num_boxes = num_boxes.cuda()
    gt_boxes = gt_boxes.cuda()

  # make variable
  im_data = Variable(im_data, volatile=True)
  im_info = Variable(im_info, volatile=True)
  num_boxes = Variable(num_boxes, volatile=True)
  gt_boxes = Variable(gt_boxes, volatile=True)

  

將文件夾中的所有圖片放入imglist中

# Set up webcam or get image directories
  if webcam_num >= 0 :
    cap = cv2.VideoCapture(webcam_num)
    num_images = 0
  else:
    imglist = os.listdir(args.image_dir)
    num_images = len(imglist)

  

進入循環,循環次數為輸入圖片數量

while (num_images >= 0):
      total_tic = time.time()
      if webcam_num == -1:
        num_images -= 1

  

將其中一個圖片讀取進來,如果是單通道,則復制三遍至三通道,並轉換成BGR格式

else:
        im_file = os.path.join(args.image_dir, imglist[num_images])
        # im = cv2.imread(im_file)
        im_in = np.array(imread(im_file))
      if len(im_in.shape) == 2:
        im_in = im_in[:,:,np.newaxis]
        im_in = np.concatenate((im_in,im_in,im_in), axis=2)
      # rgb -> bgr
      im = im_in[:,:,::-1]

  

進入 blobs, im_scales = _get_image_blob(im) 函數

先減均值、再將圖片等比例放縮至最短邊為600,返回放縮后的圖片和放縮比例。

def _get_image_blob(im):
  """Converts an image into a network input.
  Arguments:
    im (ndarray): a color image in BGR order
  Returns:
    blob (ndarray): a data blob holding an image pyramid
    im_scale_factors (list): list of image scales (relative to im) used
      in the image pyramid
  """
  im_orig = im.astype(np.float32, copy=True)
  im_orig -= cfg.PIXEL_MEANS

  im_shape = im_orig.shape
  im_size_min = np.min(im_shape[0:2])
  im_size_max = np.max(im_shape[0:2])

  processed_ims = []
  im_scale_factors = []

  for target_size in cfg.TEST.SCALES:
    im_scale = float(target_size) / float(im_size_min)
    # Prevent the biggest axis from being more than MAX_SIZE
    if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
      im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
    im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
            interpolation=cv2.INTER_LINEAR)
    im_scale_factors.append(im_scale)
    processed_ims.append(im)

  # Create a blob to hold the input images
  blob = im_list_to_blob(processed_ims)

  return blob, np.array(im_scale_factors)

  

將im_info_np存入圖片的分辨率和放縮比例

im_info_np = np.array([[im_blob.shape[1], im_blob.shape[2], im_scales[0]]], dtype=np.float32)

  

將NHWC轉換成NCHW

im_data_pt = im_data_pt.permute(0, 3, 1, 2)

  

進入前傳函數

rois, cls_prob, bbox_pred, \
      rpn_loss_cls, rpn_loss_box, \
      RCNN_loss_cls, RCNN_loss_bbox, \
      rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)

  

RCNN_base這里是vgg16的前面去掉最后一層池化層,最后輸出的尺寸是原圖尺寸的1//16,通道數是512

base_feat = self.RCNN_base(im_data)

  

RPN模塊

rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes)

  

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM