論文筆記-Squeeze-and-Excitation Networks

本文轉載自查看原文 2018-01-04 23:10 1375 deep learning

作者提出為了增強網絡的表達能力，現有的工作顯示了加強空間編碼的作用。在這篇論文里面，作者重點關注channel上的信息，提出了“Squeeze-and-Excitation"（SE）block，實際上就是顯式的讓網絡關注channel之間的信息（adaptively recalibrates channel-wise feature responsesby explicitly modelling interdependencies between channels.）。SEnets取得了ILSVRC2017的第一名, top-5 error 2.251%

之前的一些架構設計關注空間依賴
Inception architectures: embedding multi-scale processes in its modules
Resnet, stack hourglass
spatial attention: Spatial transformer networks

作者的設計思路：
we investigate a different
aspect of architectural design - the channel relationship
Our goal is to improve the representational power of a network by explicitly
modelling the interdependencies between the channels of its
convolutional features. To achieve this, we propose a mechanism that allows the network to perform feature recalibration, through which it can learn to use global information
to selectively emphasise informative features and suppress
less useful ones. 作者希望能夠對卷積特征進行recalibration，根據后文我的理解就是對channel進行加權了。

相關工作
網絡結構：
VGGNets, Inception models, BN, Resnet, Densenet, Dual path network
其他方式：Grouped convolution, Multi-branch convolution, Cross-channel correlations
This approach reflects an assumption that channel relationships can
be formulated as a composition of instance-agnostic functions with local receptive fields.
Attention, gating mechanisms

SE block

${F_{tr}}:X \in R{^{W' \times H' \times C'}},{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} U \in {\kern 1pt} {\kern 1pt} {R^{W \times H \times C}}$
設$V = [v_1, v_2, ..., v_C]$表示學習到的filter kernel, $v_c$表示第c個filter的參數，那么$F_{tr}$的輸出$U = [u_1,u_2,...,u_C]$:

\[{u_c} = {\rm{ }}{{\rm{v}}_c} * X = \sum\limits_{s = 1}^{C'} {v_c^s} * {x^s} \]

$v_c^s$是一個channel的kernel，一個新產生的channel是原有所有channel與相應的filter kernel卷積的和。channel間的關系隱式的包含在$v_c$中，但是這些信息和空間相關性糾纏在一起了，作者的目標就是讓網絡更加關注有用的信息。分成了Squeeze和Excitation兩步來完成目的。
Squeeze
現有網絡的問題：由於卷積實在local receptive field做的，因此每個卷積單元只能關注這個field內的空間信息。
為了減輕這個問題，提出了Squeeze操作將全局的空間信息編碼到channel descriptor中，具體而言是通過global average pooling操作完成的。

\[{z_c} = {F_{sq}}({u_c}) = {1 \over {W \times H}}\sum\limits_{i = 1}^W {\sum\limits_{j = 1}^H {{u_c}(i,j)} } \]

就是求每個channel的均值，作為全局的描述。
Excitation: Adaptive Recalibration
為了利用Squeeze得到的信息，提出了第二個op,這個op需要滿足2個要求：一個是足夠靈活，需要能夠學習channel間的非線性關系，另一個就是能夠學習non-mutually-exclusive關系，這個詞我的理解是非獨占性，可能是說多個channnel之間會有各種各樣的關系吧。

\[s = {F_{ex}}(z,W) = \sigma (g(z,W)) = \sigma ({W_2}\delta ({W_1}z)) \]

$\delta $是ReLu,${W_1} \in {R^{{C \over r} \times C}}$，${W_2} \in {R^{C \times {C \over r}}}$，$W_1$是bottleneck，降低channel數，$W_2$是增加channel數，$\gamma$設置為16。最終再將$U$用$s$來scale，其實也就是加權了。這樣就得到了一個block的輸出。

\[{x_c} = {F_{scale}}({u_c},{s_c}) = {s_c} \cdot {u_c} \]

$F_{scale}$表示feature map $u_c \in R^{W \times H}$和$s_c$的channel-wise乘法

The activations act as channel weights
adapted to the input-specific descriptor z. In this regard,
SE blocks intrinsically introduce dynamics conditioned on
the input, helping to boost feature discriminability

Example

SE block可以很方便的加到其他網絡結構上。
Mxnet code

squeeze = mx.sym.Pooling(data=bn3, global_pool=True, kernel=(7, 7), pool_type='avg', name=name + '_squeeze')
squeeze = mx.symbol.Flatten(data=squeeze, name=name + '_flatten')
excitation = mx.symbol.FullyConnected(data=squeeze, num_hidden=int(num_filter*ratio), name=name + '_excitation1')#bottleneck
excitation = mx.sym.Activation(data=excitation, act_type='relu', name=name + '_excitation1_relu')
excitation = mx.symbol.FullyConnected(data=excitation, num_hidden=num_filter, name=name + '_excitation2')
excitation = mx.sym.Activation(data=excitation, act_type='sigmoid', name=name + '_excitation2_sigmoid')
bn3 = mx.symbol.broadcast_mul(bn3, mx.symbol.reshape(data=excitation, shape=(-1, num_filter, 1, 1)))

網絡結構
Experiments

參考文獻：
[1] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." arXiv preprint arXiv:1709.01507 (2017).

歡迎關注公眾號：vision_home 共同學習，不定期分享論文和資源

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [論文理解] Squeeze-and-Excitation Networks attention - 5 - GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond - 1 - 論文學習 SENet（Squeeze-and-Excitation Networks）算法筆記---通過學習的方式來自動獲取到每個特征通道的重要程度，然后依照這個重要程度去提升有用的特征並抑制對當前任務用處不大的特征論文筆記《Maxout Networks》 && 《Network In Network》論文筆記《Notes on convolutional neural networks》論文筆記：Visualizing and Understanding Convolutional Networks 論文筆記 Visualizing and Understanding Convolutional Networks STN-Spatial Transformer Networks-論文筆記論文筆記：Learning Dynamic Memory Networks for Object Tracking 論文筆記之：Semi-Supervised Learning with Generative Adversarial Networks