Pytorch和CNN圖像分類
PyTorch是一個基於Torch的Python開源機器學習庫,用於自然語言處理等應用程序。它主要由Facebookd的人工智能小組開發,不僅能夠 實現強大的GPU加速,同時還支持動態神經網絡,這一點是現在很多主流框架如TensorFlow都不支持的。 PyTorch提供了兩個高級功能:
1.具有強大的GPU加速的張量計算(如Numpy)
2.包含自動求導系統的深度神經網絡。除了Facebook之外,Twitter、GMU和Salesforce等機構都采用了PyTorch。
本文使用CIFAR-10數據集進行圖像分類。該數據集中的圖像是彩色小圖像,其中被分為了十類。一些示例圖像,如下圖所示:

測試GPU是否可以使用
數據集中的圖像大小為32x32x3 。在訓練的過程中最好使用GPU來加速。
1importtorch
2importnumpyasnp
3
4#檢查是否可以利用GPU
5train_on_gpu = torch.cuda.is_available()
6
7ifnottrain_on_gpu:
8print('CUDA is not available.')
9else:
10print('CUDA is available!')
結果:
CUDA is available!
加載數據
數據下載可能會比較慢。請耐心等待。加載訓練和測試數據,將訓練數據分為訓練集和驗證集,然后為每個數據集創建DataLoader。
1fromtorchvisionimportdatasets
2importtorchvision.transformsastransforms
3fromtorch.utils.data.samplerimportSubsetRandomSampler
4
5# number of subprocesses to use for data loading
6num_workers =0
7#每批加載16張圖片
8batch_size =16
9# percentage of training set to use as validation
10valid_size =0.2
11
12#將數據轉換為torch.FloatTensor,並標准化。
13transform = transforms.Compose([
14transforms.ToTensor(),
15transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))
16])
17
18#選擇訓練集與測試集的數據
19train_data = datasets.CIFAR10('data', train=True,
20download=True, transform=transform)
21test_data = datasets.CIFAR10('data', train=False,
22download=True, transform=transform)
23
24# obtain training indices that will be used for validation
25num_train = len(train_data)
26indices = list(range(num_train))
27np.random.shuffle(indices)
28split = int(np.floor(valid_size * num_train))
29train_idx, valid_idx = indices[split:], indices[:split]
30
31# define samplers for obtaining training and validation batches
32train_sampler = SubsetRandomSampler(train_idx)
33valid_sampler = SubsetRandomSampler(valid_idx)
34
35# prepare data loaders (combine dataset and sampler)
36train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
37sampler=train_sampler, num_workers=num_workers)
38valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
39sampler=valid_sampler, num_workers=num_workers)
40test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
41num_workers=num_workers)
42
43#圖像分類中10類別
44classes = ['airplane','automobile','bird','cat','deer',
45'dog','frog','horse','ship','truck']
查看訓練集中的一批樣本
1import matplotlib.pyplot as plt
2%matplotlib inline
3
4# helper function to un-normalize and display an image
5defimshow(img):
6img = img /2+0.5# unnormalize
7plt.imshow(np.transpose(img, (1,2,0)))# convert from Tensor image
8
9#獲取一批樣本
10dataiter = iter(train_loader)
11images, labels = dataiter.next()
12images = images.numpy()# convert images to numpy for display
13
14#顯示圖像,標題為類名
15fig = plt.figure(figsize=(25,4))
16#顯示16張圖片
17foridxinnp.arange(16):
18ax = fig.add_subplot(2,16/2, idx+1, xticks=[], yticks=[])
19imshow(images[idx])
20ax.set_title(classes[labels[idx]])
結果:

查看一張圖像中的更多細節
在這里,進行了歸一化處理。紅色、綠色和藍色(RGB)顏色通道可以被看作三個單獨的灰度圖像。
1rgb_img = np.squeeze(images[3])
2channels = ['red channel','green channel','blue channel']
3
4fig = plt.figure(figsize = (36,36))
5foridxinnp.arange(rgb_img.shape[0]):
6ax = fig.add_subplot(1,3, idx +1)
7img = rgb_img[idx]
8ax.imshow(img, cmap='gray')
9ax.set_title(channels[idx])
10width, height = img.shape
11thresh = img.max()/2.5
12forxinrange(width):
13foryinrange(height):
14val = round(img[x][y],2)ifimg[x][y] !=0else0
15ax.annotate(str(val), xy=(y,x),
16horizontalalignment='center',
17verticalalignment='center', size=8,
18color='white'ifimg[x][y]<threshelse'black')
結果:

定義卷積神經網絡的結構
這里,將定義一個CNN的結構。將包括以下內容:
- 卷積層:可以認為是利用圖像的多個濾波器(經常被稱為卷積操作)進行濾波,得到圖像的特征。
- 通常,我們在 PyTorch 中使用
nn.Conv2d定義卷積層,並指定以下參數:
1nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

用 3x3 窗口和步長 1 進行卷積運算
§ in_channels 是指輸入深度。對於灰階圖像來說,深度 = 1
§ out_channels 是指輸出深度,或你希望獲得的過濾圖像數量
§ kernel_size 是卷積核的大小(通常為 3,表示 3x3 核)
§ stride 和 padding 具有默認值,但是應該根據你希望輸出在空間維度 x, y 里具有的大小設置它們的值。
- 池化層:這里采用的最大池化:對指定大小的窗口里的像素值最大值。
- 在 2x2 窗口里,取這四個值的最大值。
- 由於最大池化更適合發現圖像邊緣等重要特征,適合圖像分類任務。
- 最大池化層通常位於卷積層之后,用於縮小輸入的 x-y 維度 。
- 通常的“線性+dropout”層可避免過擬合,並產生輸出10類別。
下圖中,可以看到這是一個具有2個卷積層的神經網絡。
卷積層的輸出大小
要計算給定卷積層的輸出大小,我們可以執行以下計算:
這里,假設輸入大小為(H,W),濾波器大小為(FH,FW),輸出大小為 (OH,OW),填充為P,步幅為S。此時,輸出大小可通過下面公式進行計算。

例: 輸入大小為(H=7,W=7),濾波器大小為(FH=3,FW=3),填充為P=0,步幅為S=1, 輸出大小為 (OH=5,OW=5)。如果用 S=2,將得輸出大小為 (OH=3,OW=3)。
1importtorch.nnasnn
2importtorch.nn.functionalasF
3
4#定義卷積神經網絡結構
5classNet(nn.Module):
6def__init__(self):
7super(Net, self).__init__()
8#卷積層 (32x32x3的圖像)
9self.conv1 = nn.Conv2d(3,16,3, padding=1)
10#卷積層(16x16x16)
11self.conv2 = nn.Conv2d(16,32,3, padding=1)
12#卷積層(8x8x32)
13self.conv3 = nn.Conv2d(32,64,3, padding=1)
14#最大池化層
15self.pool = nn.MaxPool2d(2,2)
16# linear layer (64 * 4 * 4 -> 500)
17self.fc1 = nn.Linear(64*4*4,500)
18# linear layer (500 -> 10)
19self.fc2 = nn.Linear(500,10)
20# dropout層 (p=0.3)
21self.dropout = nn.Dropout(0.3)
22
23defforward(self, x):
24# add sequence of convolutional and max pooling layers
25x = self.pool(F.relu(self.conv1(x)))
26x = self.pool(F.relu(self.conv2(x)))
27x = self.pool(F.relu(self.conv3(x)))
28# flatten image input
29x = x.view(-1,64*4*4)
30# add dropout layer
31x = self.dropout(x)
32# add 1st hidden layer, with relu activation function
33x = F.relu(self.fc1(x))
34# add dropout layer
35x = self.dropout(x)
36# add 2nd hidden layer, with relu activation function
37x = self.fc2(x)
38returnx
39
40# create a complete CNN
41model = Net()
42print(model)
43
44#使用GPU
45iftrain_on_gpu:
46model.cuda()
結果:
1Net(
2(conv1): Conv2d(3,16, kernel_size=(3,3), stride=(1,1), padding=(1,1))
3(conv2): Conv2d(16,32, kernel_size=(3,3), stride=(1,1), padding=(1,1))
4(conv3): Conv2d(32,64, kernel_size=(3,3), stride=(1,1), padding=(1,1))
5(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
6(fc1): Linear(in_features=1024, out_features=500, bias=True)
7(fc2): Linear(in_features=500, out_features=10, bias=True)
8(dropout): Dropout(p=0.3, inplace=False)
9)
選擇損失函數與優化函數
1importtorch.optimasoptim
2#使用交叉熵損失函數
3criterion = nn.CrossEntropyLoss()
4#使用隨機梯度下降,學習率lr=0.01
5optimizer = optim.SGD(model.parameters(), lr=0.01)
訓練卷積神經網絡模型
注意:訓練集和驗證集的損失是如何隨着時間的推移而減少的;如果驗證損失不斷增加,則表明可能過擬合現象。(實際上,在下面的例子中,如果n_epochs設置為40,可以發現存在過擬合現象!)
1#訓練模型的次數
2n_epochs =30
3
4valid_loss_min = np.Inf# track change in validation loss
5
6forepochinrange(1, n_epochs+1):
7
8# keep track of training and validation loss
9train_loss =0.0
10valid_loss =0.0
11
12###################
13#訓練集的模型 #
14###################
15model.train()
16fordata, targetintrain_loader:
17# move tensors to GPU if CUDA is available
18iftrain_on_gpu:
19data, target = data.cuda(), target.cuda()
20# clear the gradients of all optimized variables
21optimizer.zero_grad()
22# forward pass: compute predicted outputs by passing inputs to the model
23output = model(data)
24# calculate the batch loss
25loss = criterion(output, target)
26# backward pass: compute gradient of the loss with respect to model parameters
27loss.backward()
28# perform a single optimization step (parameter update)
29optimizer.step()
30# update training loss
31train_loss += loss.item()*data.size(0)
32
33######################
34#驗證集的模型#
35######################
36model.eval()
37fordata, targetinvalid_loader:
38# move tensors to GPU if CUDA is available
39iftrain_on_gpu:
40data, target = data.cuda(), target.cuda()
41# forward pass: compute predicted outputs by passing inputs to the model
42output = model(data)
43# calculate the batch loss
44loss = criterion(output, target)
45# update average validation loss
46valid_loss += loss.item()*data.size(0)
47
48#計算平均損失
49train_loss = train_loss/len(train_loader.sampler)
50valid_loss = valid_loss/len(valid_loader.sampler)
51
52#顯示訓練集與驗證集的損失函數
53print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
54epoch, train_loss, valid_loss))
55
56#如果驗證集損失函數減少,就保存模型。
57ifvalid_loss <= valid_loss_min:
58print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
59valid_loss_min,
60valid_loss))
61torch.save(model.state_dict(),'model_cifar.pt')
62valid_loss_min = valid_loss
結果:
1Epoch: 1TrainingLoss: 2.065666ValidationLoss: 1.706993
2Validationlossdecreased(inf--> 1.706993).Savingmodel...
3Epoch: 2TrainingLoss: 1.609919ValidationLoss: 1.451288
4Validationlossdecreased(1.706993--> 1.451288).Savingmodel...
5Epoch: 3TrainingLoss: 1.426175ValidationLoss: 1.294594
6Validationlossdecreased(1.451288--> 1.294594).Savingmodel...
7Epoch: 4TrainingLoss: 1.307891ValidationLoss: 1.182497
8Validationlossdecreased(1.294594--> 1.182497).Savingmodel...
9Epoch: 5TrainingLoss: 1.200655ValidationLoss: 1.118825
10Validationlossdecreased(1.182497--> 1.118825).Savingmodel...
11Epoch: 6TrainingLoss: 1.115498ValidationLoss: 1.041203
12Validationlossdecreased(1.118825--> 1.041203).Savingmodel...
13Epoch: 7TrainingLoss: 1.047874ValidationLoss: 1.020686
14Validationlossdecreased(1.041203--> 1.020686).Savingmodel...
15Epoch: 8TrainingLoss: 0.991542ValidationLoss: 0.936289
16Validationlossdecreased(1.020686--> 0.936289).Savingmodel...
17Epoch: 9TrainingLoss: 0.942437ValidationLoss: 0.892730
18Validationlossdecreased(0.936289--> 0.892730).Savingmodel...
19Epoch: 10TrainingLoss: 0.894279ValidationLoss: 0.875833
20Validationlossdecreased(0.892730--> 0.875833).Savingmodel...
21Epoch: 11TrainingLoss: 0.859178ValidationLoss: 0.838847
22Validationlossdecreased(0.875833--> 0.838847).Savingmodel...
23Epoch: 12TrainingLoss: 0.822664ValidationLoss: 0.823634
24Validationlossdecreased(0.838847--> 0.823634).Savingmodel...
25Epoch: 13TrainingLoss: 0.787049ValidationLoss: 0.802566
26Validationlossdecreased(0.823634--> 0.802566).Savingmodel...
27Epoch: 14TrainingLoss: 0.749585ValidationLoss: 0.785852
28Validationlossdecreased(0.802566--> 0.785852).Savingmodel...
29Epoch: 15TrainingLoss: 0.721540ValidationLoss: 0.772729
30Validationlossdecreased(0.785852--> 0.772729).Savingmodel...
31Epoch: 16TrainingLoss: 0.689508ValidationLoss: 0.768470
32Validationlossdecreased(0.772729--> 0.768470).Savingmodel...
33Epoch: 17TrainingLoss: 0.662432ValidationLoss: 0.758518
34Validationlossdecreased(0.768470--> 0.758518).Savingmodel...
35Epoch: 18TrainingLoss: 0.632324ValidationLoss: 0.750859
36Validationlossdecreased(0.758518--> 0.750859).Savingmodel...
37Epoch: 19TrainingLoss: 0.616094ValidationLoss: 0.729692
38Validationlossdecreased(0.750859--> 0.729692).Savingmodel...
39Epoch: 20TrainingLoss: 0.588593ValidationLoss: 0.729085
40Validationlossdecreased(0.729692--> 0.729085).Savingmodel...
41Epoch: 21TrainingLoss: 0.571516ValidationLoss: 0.734009
42Epoch: 22TrainingLoss: 0.545541ValidationLoss: 0.721433
43Validationlossdecreased(0.729085--> 0.721433).Savingmodel...
44Epoch: 23TrainingLoss: 0.523696ValidationLoss: 0.720512
45Validationlossdecreased(0.721433--> 0.720512).Savingmodel...
46Epoch: 24TrainingLoss: 0.508577ValidationLoss: 0.728457
47Epoch: 25TrainingLoss: 0.483033ValidationLoss: 0.722556
48Epoch: 26TrainingLoss: 0.469563ValidationLoss: 0.742352
49Epoch: 27TrainingLoss: 0.449316ValidationLoss: 0.726019
50Epoch: 28TrainingLoss: 0.442354ValidationLoss: 0.713364
51Validationlossdecreased(0.720512--> 0.713364).Savingmodel...
52Epoch: 29TrainingLoss: 0.421807ValidationLoss: 0.718615
53Epoch: 30TrainingLoss: 0.404595ValidationLoss: 0.729914
加載模型
1model.load_state_dict(torch.load('model_cifar.pt'))
結果:
1<All keys matched successfully>
測試訓練好的網絡
在測試數據上測試你的訓練模型!一個“好”的結果將是CNN得到大約70%,這些測試圖像的准確性。
1# track test loss
2test_loss =0.0
3class_correct = list(0.foriinrange(10))
4class_total = list(0.foriinrange(10))
5
6model.eval()
7# iterate over test data
8fordata, targetintest_loader:
9# move tensors to GPU if CUDA is available
10iftrain_on_gpu:
11data, target = data.cuda(), target.cuda()
12# forward pass: compute predicted outputs by passing inputs to the model
13output = model(data)
14# calculate the batch loss
15loss = criterion(output, target)
16# update test loss
17test_loss += loss.item()*data.size(0)
18# convert output probabilities to predicted class
19_, pred = torch.max(output,1)
20# compare predictions to true label
21correct_tensor = pred.eq(target.data.view_as(pred))
22correct = np.squeeze(correct_tensor.numpy())ifnottrain_on_gpuelsenp.squeeze(correct_tensor.cpu().numpy())
23# calculate test accuracy for each object class
24foriinrange(batch_size):
25label = target.data[i]
26class_correct[label] += correct[i].item()
27class_total[label] +=1
28
29# average test loss
30test_loss = test_loss/len(test_loader.dataset)
31print('Test Loss: {:.6f}\n'.format(test_loss))
32
33foriinrange(10):
34ifclass_total[i] >0:
35print('Test Accuracy of %5s: %2d%% (%2d/%2d)'% (
36classes[i],100* class_correct[i] / class_total[i],
37np.sum(class_correct[i]), np.sum(class_total[i])))
38else:
39print('Test Accuracy of %5s: N/A (no training examples)'% (classes[i]))
40
41print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)'% (
42100.* np.sum(class_correct) / np.sum(class_total),
43np.sum(class_correct), np.sum(class_total)))
結果:
1Test Loss:0.708721
2
3Test Accuracyofairplane:82% (826/1000)
4Test Accuracyofautomobile:81% (818/1000)
5Test Accuracyofbird:65% (659/1000)
6Test Accuracyofcat:59% (590/1000)
7Test Accuracyofdeer:75% (757/1000)
8Test Accuracyofdog:56% (565/1000)
9Test Accuracyoffrog:81% (812/1000)
10Test Accuracyofhorse:82% (823/1000)
11Test Accuracyofship:86% (866/1000)
12Test Accuracyoftruck:84% (848/1000)
13
14Test Accuracy (Overall):75% (7564/10000)
顯示測試樣本的結果
1# obtain one batch of test images
2dataiter = iter(test_loader)
3images, labels = dataiter.next()
4images.numpy()
5
6# move model inputs to cuda, if GPU available
7iftrain_on_gpu:
8images = images.cuda()
9
10# get sample outputs
11output = model(images)
12# convert output probabilities to predicted class
13_, preds_tensor = torch.max(output,1)
14preds = np.squeeze(preds_tensor.numpy())ifnottrain_on_gpuelsenp.squeeze(preds_tensor.cpu().numpy())
15
16# plot the images in the batch, along with predicted and true labels
17fig = plt.figure(figsize=(25,4))
18foridxinnp.arange(16):
19ax = fig.add_subplot(2,16/2, idx+1, xticks=[], yticks=[])
20imshow(images.cpu()[idx])
21ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
22color=("green"ifpreds[idx]==labels[idx].item()else"red"))
結果:

參考資料:
《吳恩達深度學習筆記》
《深度學習入門:基於Python的理論與實現》
https://pytorch.org/docs/stable/nn.html#
https://github.com/udacity/deep-learning-v2-pytorch
