作者提出為了增強網絡的表達能力,現有的工作顯示了加強空間編碼的作用。在這篇論文里面,作者重點關注channel上的信息,提出了“Squeeze-and-Excitation"(SE)block,實際上就是顯式的讓網絡關注channel之間的信息 (adaptively recalibrates channel-wise feature responsesby explicitly modelling interdependencies between channels.)。SEnets取得了ILSVRC2017的第一名, top-5 error 2.251%
之前的一些架構設計關注空間依賴
Inception architectures: embedding multi-scale processes in its modules
Resnet, stack hourglass
spatial attention: Spatial transformer networks
作者的設計思路:
we investigate a different
aspect of architectural design - the channel relationship
Our goal is to improve the representational power of a network by explicitly
modelling the interdependencies between the channels of its
convolutional features. To achieve this, we propose a mechanism that allows the network to perform feature recalibration, through which it can learn to use global information
to selectively emphasise informative features and suppress
less useful ones. 作者希望能夠對卷積特征進行recalibration,根據后文我的理解就是對channel進行加權了。
相關工作
網絡結構:
VGGNets, Inception models, BN, Resnet, Densenet, Dual path network
其他方式:Grouped convolution, Multi-branch convolution, Cross-channel correlations
This approach reflects an assumption that channel relationships can
be formulated as a composition of instance-agnostic functions with local receptive fields.
Attention, gating mechanisms
SE block
\({F_{tr}}:X \in R{^{W' \times H' \times C'}},{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} U \in {\kern 1pt} {\kern 1pt} {R^{W \times H \times C}}\)
設\(V = [v_1, v_2, ..., v_C]\)表示學習到的filter kernel, \(v_c\)表示第c個filter的參數,那么\(F_{tr}\)的輸出\(U = [u_1,u_2,...,u_C]\):
\(v_c^s\)是一個channel的kernel,一個新產生的channel是原有所有channel與相應的filter kernel卷積的和。channel間的關系隱式的包含在\(v_c\)中,但是這些信息和空間相關性糾纏在一起了,作者的目標就是讓網絡更加關注有用的信息。分成了Squeeze和Excitation兩步來完成目的。
Squeeze
現有網絡的問題:由於卷積實在local receptive field做的,因此每個卷積單元只能關注這個field內的空間信息。
為了減輕這個問題,提出了Squeeze操作將全局的空間信息編碼到channel descriptor中,具體而言是通過global average pooling操作完成的。
就是求每個channel的均值,作為全局的描述。
Excitation: Adaptive Recalibration
為了利用Squeeze得到的信息,提出了第二個op,這個op需要滿足2個要求:一個是足夠靈活,需要能夠學習channel間的非線性關系,另一個就是能夠學習non-mutually-exclusive關系,這個詞我的理解是非獨占性,可能是說多個channnel之間會有各種各樣的關系吧。
$\delta \(是ReLu,\){W_1} \in {R^{{C \over r} \times C}}\(,\){W_2} \in {R^{C \times {C \over r}}}\(,\)W_1\(是bottleneck,降低channel數,\)W_2\(是增加channel數,\)\gamma\(設置為16。最終再將\)U\(用\)s$來scale,其實也就是加權了。這樣就得到了一個block的輸出。
\(F_{scale}\)表示feature map \(u_c \in R^{W \times H}\)和\(s_c\)的channel-wise乘法
The activations act as channel weights
adapted to the input-specific descriptor z. In this regard,
SE blocks intrinsically introduce dynamics conditioned on
the input, helping to boost feature discriminability
- Example
SE block可以很方便的加到其他網絡結構上。 - Mxnet code
squeeze = mx.sym.Pooling(data=bn3, global_pool=True, kernel=(7, 7), pool_type='avg', name=name + '_squeeze')
squeeze = mx.symbol.Flatten(data=squeeze, name=name + '_flatten')
excitation = mx.symbol.FullyConnected(data=squeeze, num_hidden=int(num_filter*ratio), name=name + '_excitation1')#bottleneck
excitation = mx.sym.Activation(data=excitation, act_type='relu', name=name + '_excitation1_relu')
excitation = mx.symbol.FullyConnected(data=excitation, num_hidden=num_filter, name=name + '_excitation2')
excitation = mx.sym.Activation(data=excitation, act_type='sigmoid', name=name + '_excitation2_sigmoid')
bn3 = mx.symbol.broadcast_mul(bn3, mx.symbol.reshape(data=excitation, shape=(-1, num_filter, 1, 1)))
-
網絡結構
-
Experiments
參考文獻:
[1] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." arXiv preprint arXiv:1709.01507 (2017).
歡迎關注公眾號:vision_home 共同學習,不定期分享論文和資源