【CV中的Attention機制】語義分割中的scSE模塊

本文轉載自查看原文 2020-01-16 11:26 2736 語義分割/ 計算機視覺/ 深度學習專欄/ attention

前言: 本文介紹了一個用於語義分割領域的attention模塊scSE。scSE模塊與之前介紹的BAM模塊很類似，不過在這里scSE模塊只在語義分割中進行應用和測試，對語義分割准確率帶來的提升比較大。

提出scSE模塊論文的全稱是：《Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks 》。這篇文章對SE模塊進行了改進，提出了SE模塊的三個變體cSE、sSE、scSE，並通過實驗證明了了這樣的模塊可以增強有意義的特征，抑制無用特征。實驗是基於兩個醫學上的數據集MALC Dataset和Visceral Dataset進行實驗的。

語義分割模型大部分都是類似於U-Net這樣的encoder-decoder的形式，先進行下采樣，然后進行上采樣到與原圖一樣的尺寸。其添加SE模塊可以添加在每個卷積層之后，用於對feature map信息的提煉。具體方案如下圖所示：

然后開始分別介紹由SE改進的三個模塊，首先說明一下圖例:

cSE模塊：

這個模塊類似之前BAM模塊里的Channel attention模塊，通過觀察這個圖就很容易理解其實現方法，具體流程如下:

將feature map通過global average pooling方法從[C, H, W]變為[C, 1, 1]
然后使用兩個1×1×1卷積進行信息的處理，最終得到C維的向量
然后使用sigmoid函數進行歸一化，得到對應的mask
最后通過channel-wise相乘，得到經過信息校准過的feature map

import torch
import torch.nn as nn


class cSE(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.Conv_Squeeze = nn.Conv2d(in_channels,
                                      in_channels // 2,
                                      kernel_size=1,
                                      bias=False)
        self.Conv_Excitation = nn.Conv2d(in_channels // 2,
                                         in_channels,
                                         kernel_size=1,
                                         bias=False)
        self.norm = nn.Sigmoid()

    def forward(self, U):
        z = self.avgpool(U)  # shape: [bs, c, h, w] to [bs, c, 1, 1]
        z = self.Conv_Squeeze(z)  # shape: [bs, c/2, 1, 1]
        z = self.Conv_Excitation(z)  # shape: [bs, c, 1, 1]
        z = self.norm(z)
        return U * z.expand_as(U)


if __name__ == "__main__":
    bs, c, h, w = 10, 3, 64, 64
    in_tensor = torch.ones(bs, c, h, w)

    c_se = cSE(c)
    print("in shape:", in_tensor.shape)
    out_tensor = c_se(in_tensor)
    print("out shape:", out_tensor.shape)

sSE模塊：

上圖是空間注意力機制的實現，與BAM中的實現確實有很大不同，實現過程變得很簡單，具體分析如下：

直接對feature map使用1×1×1卷積, 從[C, H, W]變為[1, H, W]的features
然后使用sigmoid進行激活得到spatial attention map
然后直接施加到原始feature map中，完成空間的信息校准

NOTE: 這里需要注意一點，先使用1×1×1卷積，后使用sigmoid函數，這個信息無法從圖中直接獲取，需要理解論文。

import torch
import torch.nn as nn


class sSE(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.Conv1x1 = nn.Conv2d(in_channels, 1, kernel_size=1, bias=False)
        self.norm = nn.Sigmoid()

    def forward(self, U):
        q = self.Conv1x1(U) # U:[bs,c,h,w] to q:[bs,1,h,w]
        q = self.norm(q)
        return U * q # 廣播機制


if __name__ == "__main__":
    bs, c, h, w = 10, 3, 64, 64
    in_tensor = torch.ones(bs, c, h, w)

    s_se = sSE(c)
    print("in shape:", in_tensor.shape)
    out_tensor = s_se(in_tensor)
    print("out shape:", out_tensor.shape)

scSE模塊：

可以看出scSE是前兩個模塊的並聯，與BAM的並聯很相似，具體就是在分別通過sSE和cSE模塊后，然后將兩個模塊相加，得到更為精准校准的feature map, 直接上代碼：

import torch
import torch.nn as nn


class sSE(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.Conv1x1 = nn.Conv2d(in_channels, 1, kernel_size=1, bias=False)
        self.norm = nn.Sigmoid()

    def forward(self, U):
        q = self.Conv1x1(U)  # U:[bs,c,h,w] to q:[bs,1,h,w]
        q = self.norm(q)
        return U * q  # 廣播機制

class cSE(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.Conv_Squeeze = nn.Conv2d(in_channels, in_channels // 2, kernel_size=1, bias=False)
        self.Conv_Excitation = nn.Conv2d(in_channels//2, in_channels, kernel_size=1, bias=False)
        self.norm = nn.Sigmoid()

    def forward(self, U):
        z = self.avgpool(U)# shape: [bs, c, h, w] to [bs, c, 1, 1]
        z = self.Conv_Squeeze(z) # shape: [bs, c/2]
        z = self.Conv_Excitation(z) # shape: [bs, c]
        z = self.norm(z)
        return U * z.expand_as(U)

class scSE(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.cSE = cSE(in_channels)
        self.sSE = sSE(in_channels)

    def forward(self, U):
        U_sse = self.sSE(U)
        U_cse = self.cSE(U)
        return U_cse+U_sse

if __name__ == "__main__":
    bs, c, h, w = 10, 3, 64, 64
    in_tensor = torch.ones(bs, c, h, w)

    sc_se = scSE(c)
    print("in shape:",in_tensor.shape)
    out_tensor = sc_se(in_tensor)
    print("out shape:", out_tensor.shape)

NOTE: 沒有找到官方的實現，所以就根據論文中內容，進行基於pytorch的實現。

這三個模塊都很容易實現，可以說是僅僅比SE模塊稍微復雜一點，接下來看一下實驗部分：

作者分別在兩個數據集上使用了三個語義分割網絡，以上就是結果，可以看出scSE模塊可以帶來2-9%的提升，相比於BAM,CBAM,SE等對分類網絡帶來的1%左右的提升，要好很多。

不僅如此，添加了scSE模塊可以帶來細粒度的語義分割提升，能夠讓分割邊緣更加平滑，在醫學圖像分割領域效果很好。

后記：接觸這篇文章是在知乎一個分享kaggle圖像分割競賽的文章中，拖了很長時間才開始仔細閱讀這篇文章，其帶來的效果確實很不錯，但是實驗僅限於圖像分割，各位可以嘗試將其添加到圖像分類，目標檢測等領域，對該模塊進行測評。

參考文獻：

論文鏈接：http://arxiv.org/pdf/1803.02579v2.pdf

核心代碼：https://github.com/pprp/SimpleCVReproduction/tree/master/attention/scSE

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【CV中的Attention機制】易於集成的Convolutional Block Attention Module(CBAM模塊) 【CV中的Attention機制】最簡單最易實現的SE模塊【語義分割】PSPNet中PSP模塊的pytorch實現【語義分割】large kernel matters中GCN模塊的pytorch實現 Attention機制中權重的計算語義分割環境搭建基於深度學習的語義分割語義分割之RedNet 語義分割后處理語義分割論文閱讀