SENet&語義分割相關知識學習
對上一次學習的 HybridSN 高光譜分類網絡進行優化改進;SENet網絡學習和實現;學習視頻北京大學李夏的《語義分割中的自注意力機制和低秩重重建》 , 南開大學程明明教授的《圖像語義分割前沿進展》
HybridSN 高光譜分類網絡的優化改進
關於DropOut使用
- 在上一次的實驗代碼中,因為使用了DropOut,所以需要對應使用net.train()和net.eval()函數
- model.train() 讓model變成訓練模式,此時 dropout和batch normalization的操作在訓練過程中發揮作用,防止網絡過擬合的問題
- net.eval(): 把BN和DropOut固定住,不會取平均,而是用訓練好的值
- 這樣的話,在測試過程中,由於網絡參數都已經固定,所以每次的測試結果也都會保持一致
- 准確率在95.5%左右
模型改進——先使用二位卷積,在使用三位卷積
# 模型改進——先使用二位卷積,在使用三位卷積
class_num = 16
class HybridSN(nn.Module):
def __init__(self):
super(HybridSN, self).__init__()
# 二維卷積:原始輸入(30, 25, 25) 64個 3x3x30 的卷積核,得到 (64, 23, 23)
self.conv4_2d = nn.Sequential(
nn.Conv2d(30,64,(3,3)),
nn.BatchNorm2d(64),
nn.ReLU()
)
# 三個三維卷積
# conv1:(1, 64, 23, 23), 8個 7x3x3 的卷積核 ==> (8, 58, 21, 21)
self.conv1_3d = nn.Sequential(
nn.Conv3d(1,8,(7,3,3)),
nn.BatchNorm3d(8),
nn.ReLU()
)
# conv2:(8, 58, 21, 21), 16個 5x3x3 的卷積核 ==> (16, 54, 19, 19)
self.conv2_3d = nn.Sequential(
nn.Conv3d(8,16,(5,3,3)),
nn.BatchNorm3d(16),
nn.ReLU()
)
# conv3:(16, 54, 19, 19), 32個 5x3x3 的卷積核 ==> (32, 52, 17, 17)
self.conv3_3d = nn.Sequential(
nn.Conv3d(16,32,(3,3,3)),
nn.BatchNorm3d(32),
nn.ReLU()
)
self.fn1 = nn.Linear(480896,256)# 32*52*17*17,這里可以運行一下,print一下out.size()
self.fn2 = nn.Linear(256,128)
self.fn_out = nn.Linear(128,class_num)
self.drop = nn.Dropout(p = 0.4)
# emm我在這里使用了softmax之后,網絡在訓練過程中loss就不再下降了,不知道具體是為啥,很奇怪,,
# self.soft = nn.Softmax(dim = 1)
def forward(self, x):
# 先降到二維
out = x.view(x.shape[0],x.shape[2],x.shape[3],x.shape[4])
out = self.conv4_2d(out)
# 升維(64, 23, 23)-->(1,64, 23, 23)
out = out.view(out.shape[0],1,out.shape[1],out.shape[2],out.shape[3])
out = self.conv1_3d(out)
out = self.conv2_3d(out)
out = self.conv3_3d(out)
# 進行重組,以b行,d列的形式存放(d自動計算)
out = out.view(out.shape[0],-1)
out = self.fn1(out)
out = self.drop(out)
out = self.fn2(out)
out = self.drop(out)
out = self.fn_out(out)
return out
# 隨機輸入,測試網絡結構是否通
x = torch.randn(1,1, 30, 25, 25)
net = HybridSN()
y = net(x)
print(y.shape)
print(y)
- 由於先使用二維卷積,原始輸入(30, 25, 25) 經過64個 3x3x30 的卷積核,得到 (64, 23, 23),在進行三維卷積,可以明顯看到參數量的增加,所以整個網絡模型的訓練時間也會相應變長,不過也是可以看到准確率的提升
- 准確率在97.3%左右
引入注意力機制
# 引入注意力機制
class_num = 16
class Attention_Block(nn.Module):
def __init__(self, planes, size):
super(Attention_Block, self).__init__()
self.globalAvgPool = nn.AvgPool2d(size, stride=1)
self.fc1 = nn.Linear(planes, round(planes / 16))
self.relu = nn.ReLU()
self.fc2 = nn.Linear(round(planes / 16), planes)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
residual = x
out = self.globalAvgPool(x)
out = out.view(out.shape[0], out.shape[1])
out = self.fc1(out)
out = self.relu(out)
out = self.fc2(out)
out = self.sigmoid(out)
out = out.view(out.shape[0], out.shape[1], 1, 1)
out = out * residual
return out
class HybridSN(nn.Module):
def __init__(self):
super(HybridSN, self).__init__()
# 3個3D卷積
# conv1:(1, 30, 25, 25), 8個 7x3x3 的卷積核 ==> (8, 24, 23, 23)
self.conv1_3d = nn.Sequential(
nn.Conv3d(1,8,(7,3,3)),
nn.BatchNorm3d(8),
nn.ReLU()
)
# conv2:(8, 24, 23, 23), 16個 5x3x3 的卷積核 ==>(16, 20, 21, 21)
self.conv2_3d = nn.Sequential(
nn.Conv3d(8,16,(5,3,3)),
nn.BatchNorm3d(16),
nn.ReLU()
)
# conv3:(16, 20, 21, 21),32個 3x3x3 的卷積核 ==>(32, 18, 19, 19)
self.conv3_3d = nn.Sequential(
nn.Conv3d(16,32,(3,3,3)),
nn.BatchNorm3d(32),
nn.ReLU()
)
# 二維卷積:(576, 19, 19) 64個 3x3 的卷積核,得到 (64, 17, 17)
self.conv4_2d = nn.Sequential(
nn.Conv2d(576,64,(3,3)),
nn.BatchNorm2d(64),
nn.ReLU()
)
# 注意力機制部分
self.layer1 = self.make_layer(Attention_Block,planes = 576, size = 19)
self.layer2 = self.make_layer(Attention_Block,planes = 64, size = 17)
# 接下來依次為256,128節點的全連接層,都使用比例為0.1的 Dropout
self.fn1 = nn.Linear(18496,256)
self.fn2 = nn.Linear(256,128)
self.fn_out = nn.Linear(128,class_num)
self.drop = nn.Dropout(p = 0.1)
# emm我在這里使用了softmax之后,網絡在訓練過程中loss就不再下降了,不知道具體是為啥,很奇怪,,
# self.soft = nn.Softmax(dim = 1)
def make_layer(self, block, planes, size):
layers = []
layers.append(block(planes, size))
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv1_3d(x)
out = self.conv2_3d(out)
out = self.conv3_3d(out)
# 進行二維卷積,因此把前面的 32*18 reshape 一下,得到 (576, 19, 19)
out = out.view(out.shape[0],out.shape[1]*out.shape[2],out.shape[3],out.shape[4])
# 在二維卷積部分引入注意力機制
out = self.layer1(out)
out = self.conv4_2d(out)
out = self.layer2(out)
# 接下來是一個 flatten 操作,變為 18496 維的向量
# 進行重組,以b行,d列的形式存放(d自動計算)
out = out.view(out.shape[0],-1)
out = self.fn1(out)
out = self.drop(out)
out = self.fn2(out)
out = self.drop(out)
out = self.fn_out(out)
# out = self.soft(out)
return out
# 隨機輸入,測試網絡結構是否通
x = torch.randn(1, 1, 30, 25, 25)
net = HybridSN()
y = net(x)
print(y.shape)
print(y)
- 可以明顯感覺到網絡在訓練過程中能夠很快的收斂,並且整個網絡的訓練過程也十分穩定,最終測試結果可以達到99%左右
SENet
其中心思想:對當前的輸入特征圖的每一個channel,進行一個 Squeeze 操作得到一個權重值,然后將這個權重值與對應channel進行乘積操作,對每個channel進行加權操作,從而得到新的特征圖
!
網絡結構
- \(X--> U\)
- \(F_{tr}\)是傳統的卷積操作
- \(U--> \widetilde X\)
- Squeeze --\(F_{sq}(·)\)
- 先對U中的每一個channel做一個 Global Average Pooling 操作,然后可以得到一個1x1xC的數據
- 將整個通道上的值進行平均化操作,便能夠基於通道的整體信息來計算scale
- 因為這里作者是想要得到各channel之間的分布關聯,所以這里雖然屏蔽了每個channel中空間分布中的相關性,但無關大雅
- 用來表明該層C個feature map的數值分布情況
- 先對U中的每一個channel做一個 Global Average Pooling 操作,然后可以得到一個1x1xC的數據
- Excitation --\(F_{ex}(·,W)\)
- \(s = F_{es}(z,W) = \sigma(g(z,W)) = \sigma(W_2\delta(W_1z) )\)
- 將得到的1x1xC數據先進行一個全連接層操作, 其中\(W_1\)的維度是C * C/r
- 這個r是一個縮放參數,在文中取的是16,這個參數的目的是為了減少channel個數從而降低計算量
- 這里使用全連接層是為了充分利用通道間的相關性來得到需要的一個權重參數
- 然后經過一個ReLU層
- 接着在經過一個全連接層操作,其中\(W_2\)的維度是C/r * C
- 最后通過sigmoid 將最終權重限制到[0,1]的范圍
- 最后將這個值s作為scale乘到U的每個channel上
- Squeeze --\(F_{sq}(·)\)
- 通過控制scale的大小,把重要的特征增強,不重要的特征減弱,從而讓提取的特征指向性更強
- 作者還給出了兩種實際應用的例子
代碼實現
其實現代碼來自鏈接
import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo
__all__ = ['SENet', 'se_resnet_18', 'se_resnet_34', 'se_resnet_50', 'se_resnet_101',
'se_resnet_152']
def conv3x3(in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes)
self.downsample = downsample
self.stride = stride
if planes == 64:
self.globalAvgPool = nn.AvgPool2d(56, stride=1)
elif planes == 128:
self.globalAvgPool = nn.AvgPool2d(28, stride=1)
elif planes == 256:
self.globalAvgPool = nn.AvgPool2d(14, stride=1)
elif planes == 512:
self.globalAvgPool = nn.AvgPool2d(7, stride=1)
self.fc1 = nn.Linear(in_features=planes, out_features=round(planes / 16))
self.fc2 = nn.Linear(in_features=round(planes / 16), out_features=planes)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
original_out = out
out = self.globalAvgPool(out)
out = out.view(out.size(0), -1)
out = self.fc1(out)
out = self.relu(out)
out = self.fc2(out)
out = self.sigmoid(out)
out = out.view(out.size(0), out.size(1), 1, 1)
out = out * original_out
out += residual
out = self.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
if planes == 64:
self.globalAvgPool = nn.AvgPool2d(56, stride=1)
elif planes == 128:
self.globalAvgPool = nn.AvgPool2d(28, stride=1)
elif planes == 256:
self.globalAvgPool = nn.AvgPool2d(14, stride=1)
elif planes == 512:
self.globalAvgPool = nn.AvgPool2d(7, stride=1)
self.fc1 = nn.Linear(in_features=planes * 4, out_features=round(planes / 4))
self.fc2 = nn.Linear(in_features=round(planes / 4), out_features=planes * 4)
self.sigmoid = nn.Sigmoid()
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
original_out = out
out = self.globalAvgPool(out)
out = out.view(out.size(0), -1)
out = self.fc1(out)
out = self.relu(out)
out = self.fc2(out)
out = self.sigmoid(out)
out = out.view(out.size(0),out.size(1),1,1)
out = out * original_out
out += residual
out = self.relu(out)
return out
class SENet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
self.inplanes = 64
super(SENet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1)
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
def se_resnet_18(pretrained=False, **kwargs):
"""Constructs a ResNet-18 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = SENet(BasicBlock, [2, 2, 2, 2], **kwargs)
return model
def se_resnet_34(pretrained=False, **kwargs):
"""Constructs a ResNet-34 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = SENet(BasicBlock, [3, 4, 6, 3], **kwargs)
return model
def se_resnet_50(pretrained=False, **kwargs):
"""Constructs a ResNet-50 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = SENet(Bottleneck, [3, 4, 6, 3], **kwargs)
return model
def se_resnet_101(pretrained=False, **kwargs):
"""Constructs a ResNet-101 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = SENet(Bottleneck, [3, 4, 23, 3], **kwargs)
return model
def se_resnet_152(pretrained=False, **kwargs):
"""Constructs a ResNet-152 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = SENet(Bottleneck, [3, 8, 36, 3], **kwargs)
return model
語義分割中的自注意力機制和低秩重重建
語義分割
!
- 原始的網絡主要進行圖像分類,通過卷積層+全連接層得到最后的一個分類結果
- 當網絡的最后幾層,依舊采用卷積層,再通過上采樣輸出一個nxn的結果輸出
- 全卷積網絡,無論卷積核多大,總是收到感受野大小的限制
- 而進行語義分割,需要更大的感受野范圍
Nonlocal Networks
-
對於卷積神經網絡的感受野,其大小就是卷積核的大小,只考慮局部區域,因此是local的,而 non-local指的就是感受野可以很大,而不是一個局部領域 (全連接層就是non-local的)
-
預測一個物體的信息,需要盡可能多的采集整個圖像中各個位置的信息,考慮當前像素點和其他像素點的關聯信息
- 即利用兩個點的相似性對每個位置的特征做加權
- $y_i = \frac1{C(x)} \sum_{ \forall j}f(x_i,x_j)g(x_j) $
- \(f(x_,x_j) = e^{\theta(x_i)^T \phi(x_j)}\) 表示 \(x_i\) 和 \(x_j\) 的相關度計算,C(x)表示一個歸一化操作,\(g(x_j)\)表示參考像素的變換
-
實現原理如圖:
-
-
其相似度的計算有多種方法,不過差異不大,選了一個好操作的
-
其中Embedding的實現方式,以圖像為例,在文章中都采用1*1的卷積 ,即 \(\theta\) 和\(\phi\) 都是1x1卷積操作。
-
- \(z_i = W_zy_i + x_i\)
- 構成一個殘差模型
- 這樣也成了一個block組件,可以直接插入到神經網絡中
- 實驗也證明了這些結構其存在的必要性和有效性
- 與全連接層的關聯
- 當兩個點之間不再根據位置信息計算相似性,而是直接運算
- \(g(x_j) = x_j\)
- 歸一化系數為1
- 那么就成了全連接層,可以將全連接層理解為non-local的一個特例
-
其具體實現如下:
-
- 不過當輸入feature map的尺寸很大時,其non-local的計算量會很龐大,因此只在比較深的網絡層(高階語義層)上使用
圖像語義分割前沿進展
Res2Net
- 為了更好的利用多尺度信息,在一個ResNet block中,再次進行多尺度信息的分割,從而充分利用尺度信息
Strip Pooling
- 帶狀池化
- 傳統的標准pooling多是方形,而實際場景中會有一些物體是長條形,因此希望盡可能捕獲一個long-range的特征
- 把標准的spatial pooling的kernel的寬或高置為1,然后每次取所有水平元素或垂直元素相加求平均
- SP模塊
-
- 對於一個輸入x(HxW), 用兩個pathway 分別處理水平和垂直的strip pooling,然后再expand到輸入的原尺寸 (HxW)
- 然后將兩個pathway的結果相加進行融合 ,再用1x1卷積進行降維,最后使用sigmoid激活
- 不過感覺上面的處理部分像是計算得到了一個權重矩陣,得到了每個像素位置的權重分布情況,這樣理解起來,有點像SENet的注意力機制。。
-
同時其中任意兩個像素點之間的信息也可以通過這種類似橋接的方式得到連接,從而獲得更多的全局信息
-