論文解讀《Understanding the Effective Receptive Field in Deep Convolutional Neural Networks》

本文轉載自查看原文 2019-09-05 22:14 661

感知野的概念尤為重要，對於理解和診斷CNN網絡是否工作，其中一個神經元的感知野之外的圖像並不會對神經元的值產生影響，所以去確保這個神經元覆蓋的所有相關的圖像區域是十分重要的；
需要對輸出圖像的單個像素進行預測的任務，使每一個輸出像素具有一個比較大的感知野是十分重要的，在做預測試時，每一個關鍵的信息就不會被遺漏。

增大感知野的方法：理論上可以通過搭建更多的層的網絡實現感知域的線性增加，靠着卷積過濾器的增加；也可以使用下采樣的方法，池化，增加感知域，目前通常都結合了這兩種技術；

堆疊不同層的convnets, 最后輸出矩陣的單個神經元的表征的感知域的大小顯然不一樣；感知域越大，這意味着它應該學習距離更遠的對象之間的關系

empirical，層數越深，能夠感知的patch的尺寸也越大，但是這樣會付出更多的計算成本和時間消耗，所以需要通過traceback：

function receptive_field_sizes()


% compute input size from a given output size
f = @(output_size, ksize, stride) (output_size - 1) * stride + ksize;


%% n=1 discriminator

% fix the output size to 1 and derive the receptive field in the input
out = ...
f(f(f(1, 4, 1), ...   % conv2 -> conv3
             4, 1), ...   % conv1 -> conv2
             4, 2);       % input -> conv1

fprintf('n=1 discriminator receptive field size: %d\n', out);


%% n=2 discriminator

% fix the output size to 1 and derive the receptive field in the input
out = ...
f(f(f(f(1, 4, 1), ...   % conv3 -> conv4
             4, 1), ...   % conv2 -> conv3
             4, 2), ...   % conv1 -> conv2
             4, 2);       % input -> conv1

fprintf('n=2 discriminator receptive field size: %d\n', out);


%% n=3 discriminator

% fix the output size to 1 and derive the receptive field in the input
out = ...
f(f(f(f(f(1, 4, 1), ...   % conv4 -> conv5
             4, 1), ...   % conv3 -> conv4
             4, 2), ...   % conv2 -> conv3
             4, 2), ...   % conv1 -> conv2
             4, 2);       % input -> conv1

fprintf('n=3 discriminator receptive field size: %d\n', out);


%% n=4 discriminator

% fix the output size to 1 and derive the receptive field in the input
out = ...
f(f(f(f(f(f(1, 4, 1), ...   % conv5 -> conv6
             4, 1), ...   % conv4 -> conv5
             4, 2), ...   % conv3 -> conv4
             4, 2), ...   % conv2 -> conv3
             4, 2), ...   % conv1 -> conv2
             4, 2);       % input -> conv1

fprintf('n=4 discriminator receptive field size: %d\n', out);


%% n=5 discriminator

% fix the output size to 1 and derive the receptive field in the input
out = ...
f(f(f(f(f(f(f(1, 4, 1), ...   % conv6 -> conv7
             4, 1), ...   % conv5 -> conv6
             4, 2), ...   % conv4 -> conv5
             4, 2), ...   % conv3 -> conv4
             4, 2), ...   % conv2 -> conv3
             4, 2), ...   % conv1 -> conv2
             4, 2);       % input -> conv1

fprintf('n=5 discriminator receptive field size: %d\n', out);

作者發現並不是所有在感知域中的像素都圖對於輸出單元具有相同的貢獻：直觀的來說，感知野中間的像素對於輸出會有更大的影響。

前向傳播中，感知野中間的像素能夠傳播信息到輸出通過許多不同的路徑，邊緣的像素就相對較少。這就造成了，在反向中，通過這些路徑傳來的梯度，使得中間像素有更大量級的梯度更新。(In the forward pass, central pixels can propagate information to the output through many different paths, while the pixels in the outer area of the receptive field have very few paths to propagate its impact. In the backward pass, gradients from an output unit are propagated across all the paths, and therefore the central pixels have a much larger magnitude for the gradient from that output).

實驗