卷積神經網絡
在之前的文章里,對28 X 28的圖像,我們是通過把它展開為長度為784的一維向量,然后送進全連接層,訓練出一個分類模型.這樣做主要有兩個問題
- 圖像在同一列鄰近的像素在這個向量中可能相距較遠。它們構成的模式可能難以被模型識別。
- 對於大尺寸的輸入圖像,使用全連接層容易造成模型過大。假設輸入是高和寬均為1000像素的彩色照片(含3個通道)。即使全連接層輸出個數仍是256,該層權重參數的形狀是\(3,000,000\times 256\),按照參數為float,占用4字節計算,它占用了大約3000000 X 256 X4bytes=3000000kb=3000M=3G的內存或顯存。
很顯然,通過使用卷積操作可以有效的改善這兩個問題.關於卷積操作,池化操作等,參見置頂文章https://www.cnblogs.com/sdu20112013/p/10149529.html.
LENET
lenet是比較早期提出來的一個神經網絡,其結構如下圖所示.
LeNet的結構比較簡單,就是2次重復的卷積激活池化后面接三個全連接層.卷積層的卷積核用的5 X 5,池化用的窗口大小為2 X 2,步幅為2.
對我們的輸入(28 x 28)來說,卷積層得到的輸出shape為[batch,16,4,4],在送入全連接層前,要reshape成[batch,16x4x4].可以理解為通過卷積,對沒一個樣本,我們
都提取出來了16x4x4=256個特征.這些特征用來識別圖像里的空間模式,比如線條和物體局部.
全連接層塊含3個全連接層。它們的輸出個數分別是120、84和10,其中10為輸出的類別個數。
net0 = nn.Sequential(
nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size
nn.Sigmoid(),
nn.MaxPool2d(2, 2), # kernel_size, stride
nn.Conv2d(6, 16, 5),
nn.Sigmoid(),
nn.MaxPool2d(2, 2)
)
batch_size=64
X = torch.randn((batch_size,1,28,28))
out=net0(X)
print(out.shape)
輸出
torch.Size([64, 16, 4, 4])
這就是上面我們說的"對我們的輸入(28 x 28)來說,卷積層得到的輸出shape為[batch,16,4,4]"的由來.
模型定義
至此,我們可以給出LeNet的定義:
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size
nn.Sigmoid(),
nn.MaxPool2d(2, 2), # kernel_size, stride
nn.Conv2d(6, 16, 5),
nn.Sigmoid(),
nn.MaxPool2d(2, 2)
)
self.fc = nn.Sequential(
nn.Linear(16*4*4, 120),
nn.Sigmoid(),
nn.Linear(120, 84),
nn.Sigmoid(),
nn.Linear(84, 10)
)
def forward(self, img):
feature = self.conv(img)
output = self.fc(feature.view(img.shape[0], -1))
return output
在forward()中,在輸入全連接層之前,要先feature.view(img.shape[0], -1)做一次reshape.
我們用gpu來做訓練,所以要把net的參數都存儲在顯存上:
net = LeNet().cuda()
數據加載
import torch
from torch import nn
import sys
sys.path.append("..")
import learntorch_utils
batch_size,num_workers=64,4
train_iter,test_iter = learntorch_utils.load_data(batch_size,num_workers)
load_data定義於learntorch_utils.py,如下:
def load_data(batch_size,num_workers):
mnist_train = torchvision.datasets.FashionMNIST(root='/home/sc/disk/keepgoing/learn_pytorch/Datasets/FashionMNIST',
train=True, download=True,
transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='/home/sc/disk/keepgoing/learn_pytorch/Datasets/FashionMNIST',
train=False, download=True,
transform=transforms.ToTensor())
train_iter = torch.utils.data.DataLoader(
mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(
mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
return train_iter,test_iter
定義損失函數
l = nn.CrossEntropyLoss()
定義優化器
opt = torch.optim.Adam(net.parameters(),lr=0.01)
定義評估函數
def test():
acc_sum = 0
batch = 0
for X,y in test_iter:
X,y = X.cuda(),y.cuda()
y_hat = net(X)
acc_sum += (y_hat.argmax(dim=1) == y).float().sum().item()
batch += 1
print('acc:%f' % (acc_sum/(batch*batch_size)))
訓練
- 前向傳播
- 計算loss
- 梯度清空,反向傳播
- 更新參數
num_epochs=5
def train():
for epoch in range(num_epochs):
train_l_sum,batch=0,0
for X,y in train_iter:
X,y = X.cuda(),y.cuda() #把tensor放到顯存
y_hat = net(X) #前向傳播
loss = l(y_hat,y) #計算loss,nn.CrossEntropyLoss中會有softmax的操作
opt.zero_grad()#梯度清空
loss.backward()#反向傳播,求出梯度
opt.step()#根據梯度,更新參數
train_l_sum += loss.item()
batch += 1
print('epoch %d,train_loss %f' % (epoch + 1,train_l_sum/(batch*batch_size)))
test()
輸出如下:
epoch 1,train_loss 0.011750
acc:0.799064
epoch 2,train_loss 0.006442
acc:0.855195
epoch 3,train_loss 0.005401
acc:0.857584
epoch 4,train_loss 0.004946
acc:0.874602
epoch 5,train_loss 0.004631
acc:0.874403
