VGG
AlexNet在Lenet的基礎上增加了幾個卷積層,改變了卷積核大小,每一層輸出通道數目等,並且取得了很好的效果.但是並沒有提出一個簡單有效的思路.
VGG做到了這一點,提出了可以通過重復使⽤簡單的基礎塊來構建深度學習模型的思路.
論文地址:https://arxiv.org/abs/1409.1556
vgg的結構如下所示:
上圖給出了不同層數的vgg的結構.也就是常說的vgg16,vgg19等等.
VGG BLOCK
vgg的設計思路是,通過不斷堆疊3x3的卷積核,不斷加深模型深度.vgg net證明了加深模型深度對提高模型的學習能力是一個很有效的手段.
看上圖就能發現,連續的2個3x3卷積,感受野和一個5x5卷積是一樣的,但是前者有兩次非線性變換,后者只有一次!,這就是連續堆疊小卷積核能提高
模型特征學習的關鍵.此外,2個3x3的參數數量也比一個5x5少.(2x3x3 < 5x5)
vgg的基礎組成模塊,每一個卷積層都由n個3x3卷積后面接2x2的最大池化.池化層的步幅為2.從而卷積層卷積后,寬高不變,池化后,寬高減半.
我們可以有以下代碼:
def make_layers(in_channels,cfg):
layers = []
previous_channel = in_channels #上一層的輸出的channel數量
for v in cfg:
if v == 'M':
layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
else:
layers.append(nn.Conv2d(previous_channel,v,kernel_size=3,padding=1))
layers.append(nn.ReLU())
previous_channel = v
conv = nn.Sequential(*layers)
return conv
cfgs = {
'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
cfgs定義了不同的vgg模型的結構,比如'A'代表vgg11. 數字代表卷積后的channel數. 'M'代表Maxpool
我們可以給出模型定義
class VGG(nn.Module):
def __init__(self,input_channels,cfg,num_classes=10, init_weights=True):
super(VGG, self).__init__()
self.conv = make_layers(input_channels,cfg) # torch.Size([1, 512, 7, 7])
self.fc = nn.Sequential(
nn.Linear(512*7*7,4096),
nn.ReLU(),
nn.Linear(4096,4096),
nn.ReLU(),
nn.Linear(4096,num_classes)
)
def forward(self, img):
feature = self.conv(img)
output = self.fc(feature.view(img.shape[0], -1))
return output
卷積層的輸出可由以下測試代碼得出
# conv = make_layers(1,cfgs['A'])
# X = torch.randn((1,1,224,224))
# out = conv(X)
# #print(out.shape)
加載數據
batch_size,num_workers=4,4
train_iter,test_iter = learntorch_utils.load_data(batch_size,num_workers,resize=224)
這里batch_size調到8我的顯存就不夠了...
定義模型
net = VGG(1,cfgs['A']).cuda()
定義損失函數
loss = nn.CrossEntropyLoss()
定義優化器
opt = torch.optim.Adam(net.parameters(),lr=0.001)
定義評估函數
def test():
acc_sum = 0
batch = 0
for X,y in test_iter:
X,y = X.cuda(),y.cuda()
y_hat = net(X)
acc_sum += (y_hat.argmax(dim=1) == y).float().sum().item()
batch += 1
#print('acc_sum %d,batch %d' % (acc_sum,batch))
return 1.0*acc_sum/(batch*batch_size)
訓練
num_epochs = 3
def train():
for epoch in range(num_epochs):
train_l_sum,batch,acc_sum = 0,0,0
start = time.time()
for X,y in train_iter:
# start_batch_begin = time.time()
X,y = X.cuda(),y.cuda()
y_hat = net(X)
acc_sum += (y_hat.argmax(dim=1) == y).float().sum().item()
l = loss(y_hat,y)
opt.zero_grad()
l.backward()
opt.step()
train_l_sum += l.item()
batch += 1
mean_loss = train_l_sum/(batch*batch_size) #計算平均到每張圖片的loss
start_batch_end = time.time()
time_batch = start_batch_end - start
print('epoch %d,batch %d,train_loss %.3f,time %.3f' %
(epoch,batch,mean_loss,time_batch))
print('***************************************')
mean_loss = train_l_sum/(batch*batch_size) #計算平均到每張圖片的loss
train_acc = acc_sum/(batch*batch_size) #計算訓練准確率
test_acc = test() #計算測試准確率
end = time.time()
time_per_epoch = end - start
print('epoch %d,train_loss %f,train_acc %f,test_acc %f,time %f' %
(epoch + 1,mean_loss,train_acc,test_acc,time_per_epoch))
train()
全連接層4096個神經元,參數太多,訓練緩慢.4G的GTX 1050顯卡,訓練一個epoch大概一個多小時.
完整代碼:https://github.com/sdu2011/learn_pytorch
batch=4,收斂極慢,迭代次數不夠的話,欠擬合嚴重.在訓練集上的train accuracy也很低.
由於全連接層的存在,參數極多,造成訓練慢,顯存占用多,導致batch_size調不大.模型修改為
class VGG(nn.Module):
def __init__(self,input_channels,cfg,num_classes=10, init_weights=True):
super(VGG, self).__init__()
self.conv = make_layers(input_channels,cfg) # torch.Size([1, 512, 7, 7])
self.fc = nn.Sequential(
nn.Linear(512*7*7,512),
nn.ReLU(inplace=True), #inplace作用:節省顯存 https://www.cnblogs.com/wanghui-garcia/p/10642665.html
nn.Dropout(p=0.5),
nn.Linear(512,512),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(512,num_classes)
)
def forward(self, img):
feature = self.conv(img)
output = self.fc(feature.view(img.shape[0], -1))
return output
全連接層調整為512個神經元.batch_size調到16.訓練快多了.