Faster RCNN 和Retinanet在將圖像數據輸送到網絡之前,要對圖像數據進行預處理。大致上與博客提到的相同。
事實上還可以采取第三步,將圖片的寬和高擴展為32的整倍數,正如在Retinanet使用的。下面是一個簡單的Pytorch數據預處理模塊:
class Resizer():
def __call__(self, sample, targetSize=608, maxSize=1024, pad_N=32):
image, anns = sample['img'], sample['ann']
rows, cols = image.shape[:2]
smaller_size, larger_size = min(rows, cols), max(rows, cols)
scale = targetSize / smaller_size
if larger_size * scale > maxSize:
scale = maxSize / larger_size
image = skimage.transform.resize(image, (int(round(rows*scale)),
int(round(cols*scale))),
mode='constant')
rows, cols, cns = image.shape[:3]
pad_w, pad_h = (pad_N - cols % pad_N), (pad_N - rows % pad_N)
new_image = np.zeros((rows + pad_h, cols + pad_w, cns)).astype(np.float32)
new_image[:rows, :cols, :] = image.astype(np.float32)
anns[:, :4] *= scale
return {'img': torch.from_numpy(new_image),
'ann':torch.from_numpy(anns),
'scale':scale}