目標檢測之Faster-RCNN的pytorch代碼詳解(數據預處理篇)

本文轉載自查看原文 2018-10-01 11:34 22920 目標檢測

首先貼上代碼原作者的github:https://github.com/chenyuntc/simple-faster-rcnn-pytorch（非代碼作者，博文只解釋代碼）

今天看完了simple-faster-rcnn-pytorch-master代碼的最后一個train.py文件，是時候認真的總結一下了，我打算一共總結四篇博客用來詳細的分析Faster-RCNN的代碼的pytorch實現，四篇博客的內容及目錄結構如下：

1 Faster-RCNN的數據讀取及預處理部分：(對應於代碼的/simple-faster-rcnn-pytorch-master/data文件夾)：https://www.cnblogs.com/kerwins-AC/p/9734381.html

2 Faster-RCNN的模型准備部分：(對應於代碼目錄/simple-faster-rcnn-pytorch-master/model/utils/文件夾)：https://www.cnblogs.com/kerwins-AC/p/9752679.html

3 Faster-RCNN的模型正式介紹：(對應於代碼目錄/simple-faster-rcnn-pytorch-master/model/文件夾)：尚未完成

4 Faster-RCNN的訓練代碼部分：(對應於代碼目錄/simple-faster-rcnn-pytorch-master/train.py,trainer.py代碼)：https://www.cnblogs.com/kerwins-AC/p/9728731.html

本篇博客主要介紹代碼的數據預處理部分的內容，對應於以下幾個文件：

首先是dataset.py文件，我們用函數流程圖看一下它的結構：

然后老規矩一個函數一個函數的分析它的內容和功能！

1 def inverse_normalize(img)函數代碼如下：

1 def inverse_normalize(img):
2     if opt.caffe_pretrain:
3         img = img + (np.array([122.7717, 115.9465, 102.9801]).reshape(3, 1, 1))
4         return img[::-1, :, :]
5     # approximate un-normalize for visualize
6     return (img * 0.225 + 0.45).clip(min=0, max=1) * 255

inverse_normalize()

函數首先讀取opt.caffe_pretrain判斷是否使用caffe_pretrain進行預訓練如果是的話，對圖片進行逆正則化處理，就是將圖片處理成caffe模型需要的格式

2 def pytorch_normalize(img) 函數代碼如下：

1 def pytorch_normalze(img):
2     """
3     https://github.com/pytorch/vision/issues/223
4     return appr -1~1 RGB
5     """
6     normalize = tvtsf.Normalize(mean=[0.485, 0.456, 0.406],
7                                 std=[0.229, 0.224, 0.225])
8     img = normalize(t.from_numpy(img))
9     return img.numpy()

pytorch_normalize

函數首先設置歸一化參數normalize=tvtsf.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]) 然后對圖片進行歸一化處理img=normalize(t.from_numpy(img))

3 def caffe_normalize(img)函數代碼如下：

1 def caffe_normalize(img):
2     """
3     return appr -125-125 BGR
4     """
5     img = img[[2, 1, 0], :, :]  # RGB-BGR
6     img = img * 255
7     mean = np.array([122.7717, 115.9465, 102.9801]).reshape(3, 1, 1)
8     img = (img - mean).astype(np.float32, copy=True)
9     return img

caffe_normalize(img)

caffe的圖片格式是BGR，所以需要img[[2,1,0],:,:]將RGB轉換成BGR的格式，然后圖片img = img*255 , mean = np.array([122.7717,115.9465,102.9801]).reshape(3,1,1)設置圖片均值

然后用圖片減去均值完成caffe形式的歸一化處理

4 def preprocess(img, min_size=600, max_size=1000)函數代碼如下：

 1 def preprocess(img, min_size=600, max_size=1000):
 2     """Preprocess an image for feature extraction.
 3 
 4     The length of the shorter edge is scaled to :obj:`self.min_size`.
 5     After the scaling, if the length of the longer edge is longer than
 6     :param min_size:
 7     :obj:`self.max_size`, the image is scaled to fit the longer edge
 8     to :obj:`self.max_size`.
 9 
10     After resizing the image, the image is subtracted by a mean image value
11     :obj:`self.mean`.
12 
13     Args:
14         img (~numpy.ndarray): An image. This is in CHW and RGB format.
15             The range of its value is :math:`[0, 255]`.
16 
17     Returns:
18         ~numpy.ndarray: A preprocessed image.
19 
20     """
21     C, H, W = img.shape
22     scale1 = min_size / min(H, W)
23     scale2 = max_size / max(H, W)
24     scale = min(scale1, scale2)
25     img = img / 255.
26     img = sktsf.resize(img, (C, H * scale, W * scale), mode='reflect',anti_aliasing=False)
27     # both the longer and shorter should be less than
28     # max_size and min_size
29     if opt.caffe_pretrain:
30         normalize = caffe_normalize
31     else:
32         normalize = pytorch_normalze
33     return normalize(img)

preprocess()

圖片處理函數，C,H,W = img.shape 讀取圖片格式通道，高度，寬度

Scale1 = min_size/min(H,W)

Scale2 = max_size / max(H,W)

Scale = min(scale1,scale2)設置放縮比，這個過程很直覺，選小的方便大的和小的都能夠放縮到合適的位置

img = img/ 255

img = sktsf.resize(img,(C,H*scale,W*scale),model='reflecct')將圖片調整到合適的大小位於(min_size,max_size)之間、

然后根據opt.caffe_pretrain是否存在選擇調用前面的pytorch正則化還是caffe_pretrain正則化

5 class Transform(object):代碼如下

 1 class Transform(object):
 2 
 3     def __init__(self, min_size=600, max_size=1000):
 4         self.min_size = min_size
 5         self.max_size = max_size
 6 
 7     def __call__(self, in_data):
 8         img, bbox, label = in_data
 9         _, H, W = img.shape
10         img = preprocess(img, self.min_size, self.max_size)
11         _, o_H, o_W = img.shape
12         scale = o_H / H
13         bbox = util.resize_bbox(bbox, (H, W), (o_H, o_W))
14 
15         # horizontally flip
16         img, params = util.random_flip(
17             img, x_random=True, return_param=True)
18         bbox = util.flip_bbox(
19             bbox, (o_H, o_W), x_flip=params['x_flip'])
20 
21         return img, bbox, label, scale

Transform

__init__函數設置了圖片的最小最大尺寸，本pytorch代碼中min_size=600,max_size=1000

__call__函數中從in_data中讀取 img,bbox,label 圖片，bboxes的框框和label

然后從_,H,W = img.shape讀取出圖片的長和寬

img = preposses(img,self.min_size,self.max_size)將圖片進行最小最大化放縮然后進行歸一化

_,o_H,o_W = img.shape 讀取放縮后圖片的shape

scale = o_H/H 放縮前后相除，得出放縮比因子

bbox = util.reszie_bbox(bbox,(H,W),(o_H,o_W)) 重新調整bboxes框的大小

img,params = utils.random_flip(img.x_random =True,return_param=True)進行圖片的隨機反轉，圖片旋轉不變性，增強網絡的魯棒性！

同樣的對bboxes進行隨機反轉，最后返回img,bbox,label,scale

6 class Dataset 代碼如下

 1 class Dataset:
 2     def __init__(self, opt):
 3         self.opt = opt
 4         self.db = VOCBboxDataset(opt.voc_data_dir)
 5         self.tsf = Transform(opt.min_size, opt.max_size)
 6 
 7     def __getitem__(self, idx):
 8         ori_img, bbox, label, difficult = self.db.get_example(idx)
 9 
10         img, bbox, label, scale = self.tsf((ori_img, bbox, label))
11         # TODO: check whose stride is negative to fix this instead copy all
12         # some of the strides of a given numpy array are negative.
13         return img.copy(), bbox.copy(), label.copy(), scale
14 
15     def __len__(self):
16         return len(self.db)

class Dataset

__init__初始化設置self.opt =opt ,self.db = VOCBboxDataset(opt.voc_data_dir)以及self.tsf = Transform(opt.min_size,opt.max_size)

—getitem__可以簡單的理解為從數據集存儲路徑中將例子一個個的獲取出來，然后調用前面的Transform函數將圖片,label進行最小值最大值放縮歸一化，重新調整bboxes的大小，然后隨機反轉，最后將數據集返回！

7 class TestDataset 代碼如下

 1 class TestDataset:
 2     def __init__(self, opt, split='test', use_difficult=True):
 3         self.opt = opt
 4         self.db = VOCBboxDataset(opt.voc_data_dir, split=split, use_difficult=use_difficult)
 5 
 6     def __getitem__(self, idx):
 7         ori_img, bbox, label, difficult = self.db.get_example(idx)
 8         img = preprocess(ori_img)
 9         return img, ori_img.shape[1:], bbox, label, difficult
10 
11     def __len__(self):
12         return len(self.db)

TestDataset

TestData完成的功能和前面類似，但是獲取調用的數據集是不同的，因為def __init__(self,opt,split='test',use_difficult=True)可以看到它在從Voc_data_dir中獲取數據的時候使用了split='test'也就是從test往后分割的部分數據送入到TestDataset的self.db中，然后在進行圖片處理的時候，並沒有調用transform函數，因為測試圖片集沒有bboxes需要考慮，同時測試圖片集也不需要隨機反轉，反轉無疑為測試准確率設置了阻礙！所以直接調用preposses()函數進行最大值最小值裁剪然后歸一化就完成了測試數據集的處理！最后將整個self.db返回，至此，dataset.py介紹完畢

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 目標檢測之Faster-RCNN的pytorch代碼詳解(模型訓練篇) 目標檢測之Faster-RCNN的pytorch代碼詳解(模型准備篇) faster-rcnn代碼閱讀-數據預處理使用pytorch訓練自己的Faster-RCNN目標檢測模型 faster-rcnn 目標檢測數據集制作 Faster-rcnn實現目標檢測目標檢測之faster-RCNN和FPN Faster-RCNN pytorch代碼學習 Caffe使用step by step：faster-rcnn目標檢測matlab代碼 caffe框架下目標檢測——faster-rcnn實戰篇問題集錦