介紹
Resnet分類網絡是當前應用最為廣泛的CNN特征提取網絡。
我們的一般印象當中,深度學習愈是深(復雜,參數多)愈是有着更強的表達能力。憑着這一基本准則CNN分類網絡自Alexnet的7層發展到了VGG的16乃至19層,后來更有了Googlenet的22層。可后來我們發現深度CNN網絡達到一定深度后再一味地增加層數並不能帶來進一步地分類性能提高,反而會招致網絡收斂變得更慢,test dataset的分類准確率也變得更差。排除數據集過小帶來的模型過擬合等問題后,我們發現過深的網絡仍然還會使分類准確度下降(相對於較淺些的網絡而言)。
簡單地增加網絡層數會導致梯度消失和梯度爆炸,因此,人們提出了正則化初始化和中間的正則化層(Batch Normalization),但是 又引發了另外一個問題——退化問題,即隨着網絡層數地增加,訓練集上的准確率卻飽和甚至下降。這個問題並不是由過擬合(overfit)造成的,因為過擬合表現應該表現為在訓練集上變現更好。
residual learning的block是通過使用多個有參層來學習輸入輸出之間的殘差表示,而非像一般CNN網絡(如Alexnet/VGG等)那樣使用有參層來直接嘗試學習輸入、輸出之間的映射。實驗表明使用一般意義上的有參層來直接學習殘差比直接學習輸入、輸出間映射要容易得多(收斂速度更快),也有效得多(可通過使用更多的層來達到更高的分類精度)。
Resnet網絡
殘差:觀測值與估計值之間的差。
在網絡中增加直連通道,允許原始輸入信息直接傳到后面的層中,當前網絡不需要學習整個的輸出,只學習上一個網絡輸出的殘差,拷貝一個淺層網絡的輸出加給深層的輸出,直接將輸入信息繞道傳到輸出,讓深度學習后面的層至少實現恆等快捷連接(identity shortcut connection)的作用,保護信息完整性,整個網絡只需要學習輸入、輸出差別的那部分,簡化了學習目標和難度。
\(F(x)=H(x)-x\).\(x\)是估計值(也就是上一層ResNet輸出的特征映射),一般稱x為identity Function,它是一個跳躍連接;\(F(x)\)是ResNet Function,\(H(x)\)是深層輸出,觀測值。當\(x\)代表的特征已經足夠成熟,\(F(x)\)會自動趨向於使學習成為0.
residual模塊改變了前向和后向信息傳遞的方式,從而促進了網絡的優化,在反向傳播過程中梯度的傳導多了更簡便的路徑。會明顯減小模塊中參數的值從而讓網絡中的參數對反向傳導的損失值有更敏感的響應能力,雖然從根本上沒有解決回傳損失小的問題,但卻讓參數減小,相對而言增加了回傳損失的效果,也產生了一定的正則化作用。
網絡結構
- basic模式:簡單地將X相對Y缺失的通道直接補零從而使其能夠相對齊的方式,以兩個3*3的卷積網絡串接在一起作為一個殘差塊;
- bottleneck模式:通過使用1x1的conv來表示\(W_s\)映射從而使得最終輸入與輸出的通道達到一致的方式。1*1,3*3,1*1三個卷積網絡串接在一起作為一個殘差模塊。加入1*1的卷積核巧妙地縮減或擴張feature map維度從而使得我們的3x3 conv的filters數目不受外界即上一層輸入的影響,自然它的輸出也不會影響到下一層module,增加非線性和減小輸出的深度以減小計算成本。不過它純是為了節省計算時間進而縮小整個模型訓練所需的時間而設計的,對最終的模型精度並無影響。
代碼實現
import torch
import torch.nn as nn
class BasicBlock(nn.Module):
"""Basic Block for resnet 18 and resnet 34
"""
#BasicBlock and BottleNeck block
#have different output size
#we use class attribute expansion
#to distinct
expansion = 1
def __init__(self, in_channels, out_channels, stride=1):
super().__init__()
#residual function
self.residual_function = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels * BasicBlock.expansion, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(out_channels * BasicBlock.expansion)
)
#shortcut
self.shortcut = nn.Sequential()
#the shortcut output dimension is not the same with residual function
#use 1*1 convolution to match the dimension
if stride != 1 or in_channels != BasicBlock.expansion * out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels * BasicBlock.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * BasicBlock.expansion)
)
def forward(self, x):
return nn.ReLU(inplace=True)(self.residual_function(x) + self.shortcut(x))
class BottleNeck(nn.Module):
"""Residual block for resnet over 50 layers
"""
expansion = 4
def __init__(self, in_channels, out_channels, stride=1):
super().__init__()
self.residual_function = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels, stride=stride, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels * BottleNeck.expansion, kernel_size=1, bias=False),
nn.BatchNorm2d(out_channels * BottleNeck.expansion),
)
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels * BottleNeck.expansion:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels * BottleNeck.expansion, stride=stride, kernel_size=1, bias=False),
nn.BatchNorm2d(out_channels * BottleNeck.expansion)
)
def forward(self, x):
return nn.ReLU(inplace=True)(self.residual_function(x) + self.shortcut(x))
class ResNet(nn.Module):
def __init__(self, block, num_block, num_classes=100):
super().__init__()
self.in_channels = 64
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True))
#we use a different inputsize than the original paper
#so conv2_x's stride is 1
self.conv2_x = self._make_layer(block, 64, num_block[0], 1)
self.conv3_x = self._make_layer(block, 128, num_block[1], 2)
self.conv4_x = self._make_layer(block, 256, num_block[2], 2)
self.conv5_x = self._make_layer(block, 512, num_block[3], 2)
self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, out_channels, num_blocks, stride):
"""make resnet layers(by layer i didnt mean this 'layer' was the
same as a neuron netowork layer, ex. conv layer), one layer may
contain more than one residual block
Args:
block: block type, basic block or bottle neck block
out_channels: output depth channel number of this layer
num_blocks: how many blocks per layer
stride: the stride of the first block of this layer
Return:
return a resnet layer
"""
# we have num_block blocks per layer, the first block
# could be 1 or 2, other blocks would always be 1
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(block(self.in_channels, out_channels, stride))
self.in_channels = out_channels * block.expansion
return nn.Sequential(*layers)
def forward(self, x):
output = self.conv1(x)
output = self.conv2_x(output)
output = self.conv3_x(output)
output = self.conv4_x(output)
output = self.conv5_x(output)
output = self.avg_pool(output)
output = output.view(output.size(0), -1)
output = self.fc(output)
return output
def resnet18():
""" return a ResNet 18 object
"""
return ResNet(BasicBlock, [2, 2, 2, 2])
def resnet34():
""" return a ResNet 34 object
"""
return ResNet(BasicBlock, [3, 4, 6, 3])
def resnet50():
""" return a ResNet 50 object
"""
return ResNet(BottleNeck, [3, 4, 6, 3])
def resnet101():
""" return a ResNet 101 object
"""
return ResNet(BottleNeck, [3, 4, 23, 3])
def resnet152():
""" return a ResNet 152 object
"""
return ResNet(BottleNeck, [3, 8, 36, 3])
引用: