關於single crop/multiple crops

什么是single crop/multiple crop

對於一個分類網絡比如AlexNet，在測試階段，使用single crop/multiple crop得到的結果是不一樣的[0]，相當於將測試圖像做數據增強。

shicaiyang(星空下的巫師)說[1],訓練的時候當然隨機剪裁，但測試的時候有技巧：

單純將測試圖像resize到某個尺度（例如256xN），選擇其中centor crop（即圖像正中間區域，比如224x224），作為CNN的輸入，去評估該模型
Multiple Crop的話具體形式有多種，可自行指定，比如：
- 10個crops: 取（左上，左下，右上，右下，正中）以及它們的水平翻轉。這10個crops在CNN下的預測輸出取平均作為最終預測結果。
- 144個crops：這個略復雜，以ImageNet為例：
  - 首先將圖像resize到4個尺度（比如256xN，320xN，384xN，480xN）
  - 每個尺度上去取（最左，正中，最右）3個位置的正方形區域
  - 對每個正方形區域，取上述的10個224x224的crops，則得到4x3x10=120個crops
  - 對上述正方形區域直接resize到224x224，以及做水平翻轉，則又得到4x3x2=24個crops
  - 總共加起來得到120+24=144個crops，所有crops的預測輸出的平均作為整個模型對當前測試圖像的輸出

上述10個crop的做法，在ZFNet[2]論文有提到：

The model was trained on the ImageNet 2012 training
set (1.3 million images, spread over 1000 different
classes). Each RGB image was preprocessed by resizing
the smallest dimension to 256, cropping the center
256x256 region, subtracting the per-pixel mean (across
all images) and then using 10 dierent sub-crops of size
224x224 (corners + center with(out) horizontal
ips).

其實更早的AlexNet原文[3]說得更詳細，為了防止過擬合而做了multiple crops：

The easiest and most common method to reduce overfitting on image data is to artificially enlarge
the dataset using label-preserving transformations (e.g., [25, 4, 5]). We employ two distinct forms
of data augmentation, both of which allow transformed images to be produced from the original
images with very little computation, so the transformed images do not need to be stored on disk.
In our implementation, the transformed images are generated in Python code on the CPU while the
GPU is training on the previous batch of images. So these data augmentation schemes are, in effect,
computationally free.

The first form of data augmentation consists of generating image translations and horizontal reflections.
We do this by extracting random 224x224 patches (and their horizontal reflections) from the
256x256 images and training our network on these extracted patches4. This increases the size of our
training set by a factor of 2048, though the resulting training examples are, of course, highly interdependent.
Without this scheme, our network suffers from substantial overfitting, which would have
forced us to use much smaller networks. At test time, the network makes a prediction by extracting
five 224 x 224 patches (the four corner patches and the center patch) as well as their horizontal
reflections (hence ten patches in all), and averaging the predictions made by the network’s softmax
layer on the ten patches.

不過在Caffe自帶的alexnet例子（/models/bvlc_alexnet/train_val.prototxt）中，無論是train階段還是test階段，都是single crop。它通過在prototxt中data層設定crop_size來做到：

name: "AlexNet"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
  data_param {
    source: "examples/imagenet/ilsvrc12_train_lmdb"
    batch_size: 256
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
  data_param {
    source: "examples/imagenet/ilsvrc12_val_lmdb"
    batch_size: 50
    backend: LMDB
  }
}

翻看具體Caffe[3]是怎么實現上述crop的代碼，發現在訓練階段是從0到im_height-crop_size+1之間的隨機數作為邊長，測試階段則是(im_height-crop_size)/2為邊長，具體實現在src/data_transformer.cpp中：

  int h_off = 0;
  int w_off = 0;
  if (crop_size) {
    height = crop_size;
    width = crop_size;
    // We only do random crop when we do training.
    if (phase_ == TRAIN) {
      h_off = Rand(datum_height - crop_size + 1);
      w_off = Rand(datum_width - crop_size + 1);
    } else {
      h_off = (datum_height - crop_size) / 2;
      w_off = (datum_width - crop_size) / 2;
    }
  }

結論

所以對於Caffe框架，在一個iteration上，Caffe的data layer做的是single crop; 如果考慮多個iteration，那么訓練階段可以認為是multiple crops，畢竟邊長是隨機的，多次之間總有差別；但是在測試階段，Caffe的data layer的確是single crop，如果要multiple crops就只好自行修改了，比如將原始圖像用代碼crop出多份存到磁盤，或者用pycaffe做測試，邊執行multple crops邊執行測試。

refs

https://www.zhihu.com/question/268494717/answer/356102226 王晉東不在家的回答下，何欽堯的評論
http://caffecn.cn/?/question/428
Visualizing and Understanding Convolutional Networks, arXiv.1311.2901
https://github.com/BVLC/caffe

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 salesforce lightning零基礎學習(十三) 自定義Lookup組件（Single & Multiple） Learning a Single Convolutional Super-Resolution Network for Multiple Degradations 論文總結 MOS管常識 CPU常識 matlab常識 reactNative之react-native-image-crop-picker vue-image-crop-upload使用記錄 Multiple Regression android p 常識 Java並發編程常識