全卷積網絡FCN
fcn是深度學習用於圖像分割的鼻祖.后續的很多網絡結構都是在此基礎上演進而來.
圖像分割即像素級別的分類.
語義分割的基本框架:
前端fcn(以及在此基礎上的segnet,deconvnet,deeplab等) + 后端crf/mrf
FCN是分割網絡的鼻祖,后面的很多網絡都是在此基礎上提出的.
論文地址
和傳統的分類網絡相比,就是將傳統分類網絡的全連接層用反卷積層替代.得到一個和圖像大小一致的feature map。本篇文章用的網絡是VGG.
主要關注兩點
- 全連接層替換成卷積層.用反卷積的方式完成上采樣
- 不同layer的輸出要做相加.用以增強feature map的表達能力.
反卷積(deconvolutional)
關於反卷積(也叫轉置卷積)的詳細推導,可以參考:<https://blog.csdn.net/LoseInVain/article/details/81098502>
簡單滴說就是:卷積的反向操作.以4x4矩陣A為例,卷積核C(3x3,stride=1),通過卷積操作得到一個2x2的矩陣B. 轉置卷積即已知B,要得到A,我們要找到卷積核C,使得B相當於A通過C做正向卷積,得到B.
轉置卷積是一種上采樣的方法.
跳連(skip layer)
如果只用特征提取部分(也就是VGG全連接層之前的部分)得到的feature map做上采樣將feature map還原到圖像輸入的size的話,feature不夠精確.所以采用不同layer的feature map做上采樣再組合起來.
代碼解析
源碼:https://github.com/pochih/FCN-pytorch
其中的核心代碼如下:
class FCNs(nn.Module):
def __init__(self, pretrained_net, n_class):
super().__init__()
self.n_class = n_class
self.pretrained_net = pretrained_net
self.relu = nn.ReLU(inplace=True)
self.deconv1 = nn.ConvTranspose2d(512, 512, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
self.bn1 = nn.BatchNorm2d(512)
self.deconv2 = nn.ConvTranspose2d(512, 256, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
self.bn2 = nn.BatchNorm2d(256)
self.deconv3 = nn.ConvTranspose2d(256, 128, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
self.bn3 = nn.BatchNorm2d(128)
self.deconv4 = nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
self.bn4 = nn.BatchNorm2d(64)
self.deconv5 = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
self.bn5 = nn.BatchNorm2d(32)
self.classifier = nn.Conv2d(32, n_class, kernel_size=1)
def forward(self, x):
output = self.pretrained_net(x)
x5 = output['x5'] # size=(N, 512, x.H/32, x.W/32)
x4 = output['x4'] # size=(N, 512, x.H/16, x.W/16)
x3 = output['x3'] # size=(N, 256, x.H/8, x.W/8)
x2 = output['x2'] # size=(N, 128, x.H/4, x.W/4)
x1 = output['x1'] # size=(N, 64, x.H/2, x.W/2)
score = self.bn1(self.relu(self.deconv1(x5))) # size=(N, 512, x.H/16, x.W/16)
score = score + x4 # element-wise add, size=(N, 512, x.H/16, x.W/16)
score = self.bn2(self.relu(self.deconv2(score))) # size=(N, 256, x.H/8, x.W/8)
score = score + x3 # element-wise add, size=(N, 256, x.H/8, x.W/8)
score = self.bn3(self.relu(self.deconv3(score))) # size=(N, 128, x.H/4, x.W/4)
score = score + x2 # element-wise add, size=(N, 128, x.H/4, x.W/4)
score = self.bn4(self.relu(self.deconv4(score))) # size=(N, 64, x.H/2, x.W/2)
score = score + x1 # element-wise add, size=(N, 64, x.H/2, x.W/2)
score = self.bn5(self.relu(self.deconv5(score))) # size=(N, 32, x.H, x.W)
score = self.classifier(score) # size=(N, n_class, x.H/1, x.W/1)
return score # size=(N, n_class, x.H/1, x.W/1)
train.py中
vgg_model = VGGNet(requires_grad=True, remove_fc=True)
fcn_model = FCNs(pretrained_net=vgg_model, n_class=n_class)
這里我們重點看FCN
的forward函數
def forward(self, x):
output = self.pretrained_net(x)
x5 = output['x5'] # size=(N, 512, x.H/32, x.W/32)
x4 = output['x4'] # size=(N, 512, x.H/16, x.W/16)
x3 = output['x3'] # size=(N, 256, x.H/8, x.W/8)
x2 = output['x2'] # size=(N, 128, x.H/4, x.W/4)
x1 = output['x1'] # size=(N, 64, x.H/2, x.W/2)
score = self.bn1(self.relu(self.deconv1(x5))) # size=(N, 512, x.H/16, x.W/16)
score = score + x4 # element-wise add, size=(N, 512, x.H/16, x.W/16)
score = self.bn2(self.relu(self.deconv2(score))) # size=(N, 256, x.H/8, x.W/8)
score = score + x3 # element-wise add, size=(N, 256, x.H/8, x.W/8)
score = self.bn3(self.relu(self.deconv3(score))) # size=(N, 128, x.H/4, x.W/4)
score = score + x2 # element-wise add, size=(N, 128, x.H/4, x.W/4)
score = self.bn4(self.relu(self.deconv4(score))) # size=(N, 64, x.H/2, x.W/2)
score = score + x1 # element-wise add, size=(N, 64, x.H/2, x.W/2)
score = self.bn5(self.relu(self.deconv5(score))) # size=(N, 32, x.H, x.W)
score = self.classifier(score) # size=(N, n_class, x.H/1, x.W/1)
return score # size=(N, n_class, x.H/1, x.W/1)
可見FCN的輸入為(batch_size,c,h,w),輸出為(batch_size,class,h,w).
首先是經過vgg的特征提取層,可以得到feature map. 5個max_pool后的feature map的size分別為
x5 = output['x5'] # size=(N, 512, x.H/32, x.W/32)
x4 = output['x4'] # size=(N, 512, x.H/16, x.W/16)
x3 = output['x3'] # size=(N, 256, x.H/8, x.W/8)
x2 = output['x2'] # size=(N, 128, x.H/4, x.W/4)
x1 = output['x1'] # size=(N, 64, x.H/2, x.W/2)
之后每一個pool layer的feature map都經過一次2倍上采樣,並與前一個pool layer的輸出進行element-wise add.(resnet也有類似操作).從而使得上采樣后的feature map信息更充分更精准,模型的魯棒性會更好.
例如以輸入圖片尺寸為224x224為例,pool4的輸出為(,512,14,14),pool5的輸出為(,512,7,7),反卷積后得到(,512,14,14),再與pool4的輸出做element-wise add。得到的仍然是(,512,14,14). 對這個輸出做上采樣得到(,256,28,28)再與pool3的輸出相加. 依次類推,最終得到(,64,112,112).
此后,再做一次反卷積上采樣得到(,32,224,224),之后卷積得到(,n_class,224,224)。即得到n_class張224x224的feature map。
下圖顯示了隨着上采樣的進行,得到的feature map細節越來越豐富.
損失函數
criterion = nn.BCEWithLogitsLoss()
損失函數采用二分類交叉熵.torch中有2個計算二分類交叉熵的函數
- BCELoss()
- BCEWithLogitsLoss()
后者只是在前者的基礎上,對輸入先做一個sigmoid將輸入轉換到0-1之間.即BCEWithLogitsLoss = Sigmoid + BCELoss
一個具體的例子可以參考:https://blog.csdn.net/qq_22210253/article/details/85222093