【ECCV2020】 Context-Gated Convolution

本文轉載自查看原文 2020-08-14 16:47 749 注意力機制/ 論文推介

論文：https://arxiv.org/abs/1910.05577

代碼：https://github.com/XudongLinthu/context-gated-convolution

這是來自哥倫比亞大學和騰訊 AI lab 的工作，也是一種即插即用的模塊。

論文的動機為：Neurons do change their function according to contexts and task. 但是傳統的CNN並不具有這樣的性質。當前也出現了一些方法，作者命名為global feature interaction，如下圖所示。這些方法（non-local, SENet, CBAM等）考慮到既然卷積層不具有這樣的能力，在卷積之前通過 feature interatction 的方式操作。這些方法仍沒有辦法對卷積核建模做到“changing the structure of correlations over neuronal ensembles”。

作者提出的Context-Gated Convolution，把卷積層當做一個“自適應的處理器”，可以根據圖像中的語義信息來調整卷積核的權重。

這個方法實現起來並不容易，因為對於輸入feature map 的尺寸為 \((c, h , w)\)，輸出 feature map 的尺寸為 \((o, h, w)\)，這樣，卷積參數量就是 \(o\times c \times k\times k\)。所以，必須把卷積分解為兩個：\(o\times k\times k\)和 \(c \times k \times k\) 。

這樣來看，還是比較復雜，因此，又進一步借鑒了 depth-wise separable 可分離卷積的思想。

方法的總體架構如下圖所示，包含三個關鍵模塊：context encoding module, channel interacting module, 和 gate decoding module。

1、Context encoding module

對於輸入為 \(chw\) 的特征，使用pooling降維成 \(ch'w'\)，轉后把 \(h\times w\)這個維度轉化成一維向量 \(d\)。論文里提到，如果 \(d\) 沒有定義，就使用 \((k_1\times k_2)/2\)。經過這個模塊處理，作出的特征為 \(c\times d\)。因為下一步要輸出到兩個模塊，因此，使用了兩個獨立的BN層。代碼如下：

# the context encoding module
self.ce = nn.Linear(ws*ws, num_lat, False)            
self.ce_bn = nn.BatchNorm1d(in_channels)
self.ci_bn2 = nn.BatchNorm1d(in_channels)
# activation function is relu
self.act = nn.ReLU(inplace=True)

2、 Channel Interaction module

這個模塊把輸入\(c\times d\)的特征轉化為 \(o\times d\)的特征。為了保證高效性，這里使用了 grouped FC，代碼如下：

# the number of groups in the channel interacting module
if in_channels // 16:
   self.g = 16
else:
  self.g = in_channels
# the channel interacting module    
self.ci = nn.Linear(self.g, out_channels // (in_channels // self.g), bias=False)
self.ci_bn = nn.BatchNorm1d(out_channels)

3、Gate decoding module

這個模塊接收兩個輸入，對於\(c\times d\)的輸入，使用FC轉化成 \(c\times k_1 \times k_2\)的特征；對於 \(o\times d\) 輸入，使用FC轉化成 \(o\times k_1 \times k_2\) 的特征。然后，兩組特征分別沿兩個方向復制，得到 \(o\times c \times k_1 \times k_2\) 的特征，然后加一個 sigmoid 函數，實現 gate 操作。代碼如下：

# produce gate
out = self.sig(out.view(b, 1, c, self.ks, self.ks) +    oc.view(b, self.oc, 1, self.ks, self.ks))

最后，把得到的結果，逐元素點乘的方式與卷積核融合。

由於在關鍵的步驟使用了 Grouped FC，所以計算量並沒有顯著增加，但是因為給卷積核上每個點添加了權重（注意力機制），性能得到了提升。具體可以參考論文在 ImageNet 和 CIFAR10 上的實驗。

論文中有一個比較有趣的實驗是 feature map 的可視化。在第一列里，可以看到 ResNet 對於金魚的捕獲不是特別准確，但是 CGC 方法就可以准確的捕獲金魚區域。

一些想法

這是我第一次看到給卷積核逐點分配權重，還是比較有意思。Gate decoding module 里，把 \(c\times k_1 \times k_2\)的特征和 \(o\times k_1 \times k_2\) 的特征，分別沿兩個方向復制，得到 \(o\times c \times k_1 \times k_2\) 的特征，讓我忽然想到了程明明組的 strip pooling 。不過， strip pooling 仍然是給 feature map 分配權重，這個工作是給卷積核分配權重。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Object-Contextual Representations for Semantic Segmentation(ECCV2020) 【ECCV2020】WeightNet: Revisiting the Design Space of Weight Networks Arbitrary-Oriented Object Detection with Circular Smooth Label（ECCV2020，旋轉目標檢測） SegFix: Model-Agnostic Boundary Refinement for Segmentation(ECCV2020)論文筆記 ECCV2020 Ultra Fast Structure-aware Deep Lane Detection論文解讀【ECCV2020】Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations ECCV2020最佳論文解讀之遞歸全對場變換(RAFT)光流計算模型 Free-Form Image Inpainting with Gated Convolution ECCV2020優秀論文匯總|涉及點雲處理、3D檢測識別、三維重建、立體視覺、姿態估計、深度估計、SFM等方向圖像修復之DeepFill: Free-Form Image Inpainting with Gated Convolution