計算機視覺 - 語義分割（二）

本文轉載自查看原文 2018-10-22 12:14 857 機器學習

引言

已經有很多U-Net-Like的神經網絡被提出。

U-Net適用於醫學圖像分割、自然圖像生成。

在醫學圖像分割表現好：

因為利用了底層的特征（同分辨率級聯）改善上采樣的信息不足。
醫學圖像數據一般較少，底層的特征其實很重要。

不只是醫學圖像，對於二分類的語義分割問題，類 UNet 結構均取得不錯的效果。linknet、large kernel 和 Tiramisu 等模型的效果也不錯，但不如類 UNet 結構

本文的內容主要是根據我在 Kaggle TGS Salt Identification Challenge 比賽中所做的嘗試，以及別人分享的實驗結果。

一、損失函數

最常見的損失函數就是 binary cross entropy loss 結合 dice coeff loss
前者是像素級別的損失函數
后者是圖像級別或者是 batch 級別的損失函數，適合基於以 IOU 作為評價指標的問題。
online bootstrapped cross entropy loss
比如FRNN，難樣本挖掘的一種
lovasz loss
來自論文 The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
也是適合以 IOU 作為評價指標的問題。

二、網絡的 Backbone

比較流行的 Backbone 如 SE-ResNeXt101，SE-ResNeXt50，SE-ResNet101，我覺得在數據集不是特別充足的情況下，差別不大。
由於顯存的限制，我用的是 ResNet34
之前做的一些實例檢測，實例分割問題，用 ResNet50 的效果也和 ResNet101 差不多。

三、基於 Attention 的 UNet

Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks
SE-Net 中的 SE 結構就是對 feature maps 中的不同 channel 進行加權處理。
這篇論文中把這種 attention 通用化，SE-Net 中采用的是 cSELayer，還有對不同的 position 進行加權的 sSELayer，以及兩種加權方式結合起來的 scSELayer
論文中的實驗表明這些 Attention-Gated 結構，放在不同階段的 encoder 和 decoder 之后，比起不加 attention，效果更好

class cSELayer(nn.Module):
    def __init__(self, channel, reduction=2):
        super(cSELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction),
            nn.ELU(inplace=True),
            nn.Linear(channel // reduction, channel),
            nn.Sigmoid()
        )
        
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y


class sSELayer(nn.Module):
    def __init__(self, channel):
        super(sSELayer, self).__init__()
        self.fc = nn.Conv2d(channel, 1, kernel_size=1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        y = self.fc(x)
        y = self.sigmoid(y)
        return x * y


class scSELayer(nn.Module):
    def __init__(self, channels, reduction=2):
        super(scSELayer, self).__init__()
        self.sSE = sSELayer(channels)
        self.cSE = cSELayer(channels, reduction=reduction)

    def forward(self, x):
        sx = self.sSE(x)
        cx = self.cSE(x)
        x = sx + cx
        return x

四、關於 Context

class Dblock(nn.Module):
    def __init__(self, channel):
        super(Dblock, self).__init__()
        self.dilate1 = nn.Conv2d(channel, channel, kernel_size=3, dilation=1, padding=1)
        self.dilate2 = nn.Conv2d(channel, channel, kernel_size=3, dilation=2, padding=2)
        self.dilate3 = nn.Conv2d(channel, channel, kernel_size=3, dilation=4, padding=4)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if m.bias is not None:
                    m.bias.data.zero_()

    def forward(self, x):
        dilate1_out = F.relu(self.dilate1(x), inplace=True)
        dilate2_out = F.relu(self.dilate2(dilate1_out), inplace=True)
        dilate3_out = F.relu(self.dilate3(dilate2_out), inplace=True)

        out = x + dilate1_out + dilate2_out + dilate3_out
        return out

OCNet: Object Context Network for Scene Parsing
對於語義分割，模型既需要高緯度的上下文信息（全局信息），又需要分辨率能力（即圖片的局部信息）。UNet 通過 concatenate 來提高圖片的局部信息。那么如何獲得更好的全局信息呢。OCNet論文中對 UNet 結構中間的 center block 進行了討論。

五、Hyper columns

Hypercolumns for Object Segmentation and Fine-grained Localization


        d5 = self.decoder5(center)
        d4 = self.decoder4(d5, e4) 
        d3 = self.decoder3(d4, e3) 
        d2 = self.decoder2(d3, e2) 
        d1 = self.decoder1(d2, e1)

        f = torch.cat((
            d1,
            F.interpolate(d2, scale_factor=2, mode='bilinear', align_corners=False),
            F.interpolate(d3, scale_factor=4, mode='bilinear', align_corners=False),
            F.interpolate(d4, scale_factor=8, mode='bilinear', align_corners=False),
            F.interpolate(d5, scale_factor=16, mode='bilinear', align_corners=False),
        ), 1)

六、關於 Deep Supervision

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 計算機視覺 - 語義分割『計算機視覺』語義分割網絡入門從特斯拉到計算機視覺之「圖像語義分割」詳解計算機視覺五大技術：圖像分類、對象檢測、目標跟蹤、語義分割和實例分割... 『計算機視覺』棋盤效應計算機視覺開源項目《Python計算機視覺編程》計算機視覺整理庫『計算機視覺』空洞卷積計算機視覺中的濾波