Pytorch和CNN圖像分類
PyTorch是一個基於Torch的Python開源機器學習庫,用於自然語言處理等應用程序。它主要由Facebookd的人工智能小組開發,不僅能夠 實現強大的GPU加速,同時還支持動態神經網絡,這一點是現在很多主流框架如TensorFlow都不支持的。 PyTorch提供了兩個高級功能:
1.具有強大的GPU加速的張量計算(如Numpy)
2.包含自動求導系統的深度神經網絡。除了Facebook之外,Twitter、GMU和Salesforce等機構都采用了PyTorch。
本文使用CIFAR-10數據集進行圖像分類。該數據集中的圖像是彩色小圖像,其中被分為了十類。一些示例圖像,如下圖所示:
測試GPU是否可以使用
數據集中的圖像大小為32x32x3
。在訓練的過程中最好使用GPU來加速。
1
import
torch
2
import
numpy
as
np
3
4
#
檢查是否可以利用GPU
5
train_on_gpu = torch.cuda.is_available()
6
7
if
not
train_on_gpu:
8
print(
'CUDA is not available.'
)
9
else
:
10
print(
'CUDA is available!'
)
結果:
CUDA is available!
加載數據
數據下載可能會比較慢。請耐心等待。加載訓練和測試數據,將訓練數據分為訓練集和驗證集,然后為每個數據集創建DataLoader
。
1
from
torchvision
import
datasets
2
import
torchvision.transforms
as
transforms
3
from
torch.utils.data.sampler
import
SubsetRandomSampler
4
5
# number of subprocesses to use for data loading
6
num_workers =
0
7
#
每批加載16張圖片
8
batch_size =
16
9
# percentage of training set to use as validation
10
valid_size =
0.2
11
12
#
將數據轉換為torch.FloatTensor,並標准化。
13
transform = transforms.Compose([
14
transforms.ToTensor(),
15
transforms.Normalize((
0.5
,
0.5
,
0.5
), (
0.5
,
0.5
,
0.5
))
16
])
17
18
#
選擇訓練集與測試集的數據
19
train_data = datasets.CIFAR10(
'data'
, train=
True
,
20
download=
True
, transform=transform)
21
test_data = datasets.CIFAR10(
'data'
, train=
False
,
22
download=
True
, transform=transform)
23
24
# obtain training indices that will be used for validation
25
num_train = len(train_data)
26
indices = list(range(num_train))
27
np.random.shuffle(indices)
28
split = int(np.floor(valid_size * num_train))
29
train_idx, valid_idx = indices[split:], indices[:split]
30
31
# define samplers for obtaining training and validation batches
32
train_sampler = SubsetRandomSampler(train_idx)
33
valid_sampler = SubsetRandomSampler(valid_idx)
34
35
# prepare data loaders (combine dataset and sampler)
36
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
37
sampler=train_sampler, num_workers=num_workers)
38
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
39
sampler=valid_sampler, num_workers=num_workers)
40
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
41
num_workers=num_workers)
42
43
#
圖像分類中10類別
44
classes = [
'airplane'
,
'automobile'
,
'bird'
,
'cat'
,
'deer'
,
45
'dog'
,
'frog'
,
'horse'
,
'ship'
,
'truck'
]
查看訓練集中的一批樣本
1
import
matplotlib.pyplot
as
plt
2
%matplotlib inline
3
4
# helper function to un-normalize and display an image
5
def
imshow
(img)
:
6
img = img /
2
+
0.5
# unnormalize
7
plt.imshow(np.transpose(img, (
1
,
2
,
0
)))
# convert from Tensor image
8
9
#
獲取一批樣本
10
dataiter = iter(train_loader)
11
images, labels = dataiter.next()
12
images = images.numpy()
# convert images to numpy for display
13
14
#
顯示圖像,標題為類名
15
fig = plt.figure(figsize=(
25
,
4
))
16
#
顯示16張圖片
17
for
idx
in
np.arange(
16
):
18
ax = fig.add_subplot(
2
,
16
/
2
, idx+
1
, xticks=[], yticks=[])
19
imshow(images[idx])
20
ax.set_title(classes[labels[idx]])
結果:
查看一張圖像中的更多細節
在這里,進行了歸一化處理。紅色、綠色和藍色(RGB)顏色通道可以被看作三個單獨的灰度圖像。
1
rgb_img = np.squeeze(images[
3
])
2
channels = [
'red channel'
,
'green channel'
,
'blue channel'
]
3
4
fig = plt.figure(figsize = (
36
,
36
))
5
for
idx
in
np.arange(rgb_img.shape[
0
]):
6
ax = fig.add_subplot(
1
,
3
, idx +
1
)
7
img = rgb_img[idx]
8
ax.imshow(img, cmap=
'gray'
)
9
ax.set_title(channels[idx])
10
width, height = img.shape
11
thresh = img.max()/
2.5
12
for
x
in
range(width):
13
for
y
in
range(height):
14
val = round(img[x][y],
2
)
if
img[x][y] !=
0
else
0
15
ax.annotate(str(val), xy=(y,x),
16
horizontalalignment=
'center'
,
17
verticalalignment=
'center'
, size=
8
,
18
color=
'white'
if
img[x][y]<thresh
else
'black'
)
結果:
定義卷積神經網絡的結構
這里,將定義一個CNN的結構。將包括以下內容:
- 卷積層:可以認為是利用圖像的多個濾波器(經常被稱為卷積操作)進行濾波,得到圖像的特征。
- 通常,我們在 PyTorch 中使用
nn.Conv2d
定義卷積層,並指定以下參數:
1
nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
用 3x3 窗口和步長 1 進行卷積運算
§ in_channels
是指輸入深度。對於灰階圖像來說,深度 = 1
§ out_channels
是指輸出深度,或你希望獲得的過濾圖像數量
§ kernel_size
是卷積核的大小(通常為 3,表示 3x3 核)
§ stride
和 padding
具有默認值,但是應該根據你希望輸出在空間維度 x, y 里具有的大小設置它們的值。
- 池化層:這里采用的最大池化:對指定大小的窗口里的像素值最大值。
- 在 2x2 窗口里,取這四個值的最大值。
- 由於最大池化更適合發現圖像邊緣等重要特征,適合圖像分類任務。
- 最大池化層通常位於卷積層之后,用於縮小輸入的 x-y 維度 。
- 通常的“線性+dropout”層可避免過擬合,並產生輸出10類別。
下圖中,可以看到這是一個具有2個卷積層的神經網絡。
卷積層的輸出大小
要計算給定卷積層的輸出大小,我們可以執行以下計算:
這里,假設輸入大小為(H,W),濾波器大小為(FH,FW),輸出大小為 (OH,OW),填充為P,步幅為S。此時,輸出大小可通過下面公式進行計算。
例: 輸入大小為(H=7,W=7)
,濾波器大小為(FH=3,FW=3)
,填充為P=0
,步幅為S=1
, 輸出大小為 (OH=5,OW=5)
。如果用 S=2
,將得輸出大小為 (OH=3,OW=3)
。
1
import
torch.nn
as
nn
2
import
torch.nn.functional
as
F
3
4
#
定義卷積神經網絡結構
5
class
Net
(nn.Module)
:
6
def
__init__
(self)
:
7
super(Net, self).__init__()
8
#
卷積層 (32x32x3的圖像)
9
self.conv1 = nn.Conv2d(
3
,
16
,
3
, padding=
1
)
10
#
卷積層(16x16x16)
11
self.conv2 = nn.Conv2d(
16
,
32
,
3
, padding=
1
)
12
#
卷積層(8x8x32)
13
self.conv3 = nn.Conv2d(
32
,
64
,
3
, padding=
1
)
14
#
最大池化層
15
self.pool = nn.MaxPool2d(
2
,
2
)
16
# linear layer (64 * 4 * 4 -> 500)
17
self.fc1 = nn.Linear(
64
*
4
*
4
,
500
)
18
# linear layer (500 -> 10)
19
self.fc2 = nn.Linear(
500
,
10
)
20
# dropout
層 (p=0.3)
21
self.dropout = nn.Dropout(
0.3
)
22
23
def
forward
(self, x)
:
24
# add sequence of convolutional and max pooling layers
25
x = self.pool(F.relu(self.conv1(x)))
26
x = self.pool(F.relu(self.conv2(x)))
27
x = self.pool(F.relu(self.conv3(x)))
28
# flatten image input
29
x = x.view(
-1
,
64
*
4
*
4
)
30
# add dropout layer
31
x = self.dropout(x)
32
# add 1st hidden layer, with relu activation function
33
x = F.relu(self.fc1(x))
34
# add dropout layer
35
x = self.dropout(x)
36
# add 2nd hidden layer, with relu activation function
37
x = self.fc2(x)
38
return
x
39
40
# create a complete CNN
41
model = Net()
42
print(model)
43
44
#
使用GPU
45
if
train_on_gpu:
46
model.cuda()
結果:
1
Net(
2
(conv1): Conv2d(
3
,
16
, kernel_size=(
3
,
3
), stride=(
1
,
1
), padding=(
1
,
1
))
3
(conv2): Conv2d(
16
,
32
, kernel_size=(
3
,
3
), stride=(
1
,
1
), padding=(
1
,
1
))
4
(conv3): Conv2d(
32
,
64
, kernel_size=(
3
,
3
), stride=(
1
,
1
), padding=(
1
,
1
))
5
(pool): MaxPool2d(kernel_size=
2
, stride=
2
, padding=
0
, dilation=
1
, ceil_mode=
False
)
6
(fc1): Linear(in_features=
1024
, out_features=
500
, bias=
True
)
7
(fc2): Linear(in_features=
500
, out_features=
10
, bias=
True
)
8
(dropout): Dropout(p=
0.3
, inplace=
False
)
9
)
選擇損失函數與優化函數
1
import
torch.optim
as
optim
2
#
使用交叉熵損失函數
3
criterion = nn.CrossEntropyLoss()
4
#
使用隨機梯度下降,學習率lr=0.01
5
optimizer = optim.SGD(model.parameters(), lr=
0.01
)
訓練卷積神經網絡模型
注意:訓練集和驗證集的損失是如何隨着時間的推移而減少的;如果驗證損失不斷增加,則表明可能過擬合現象。(實際上,在下面的例子中,如果n_epochs設置為40,可以發現存在過擬合現象!)
1
#
訓練模型的次數
2
n_epochs =
30
3
4
valid_loss_min = np.Inf
# track change in validation loss
5
6
for
epoch
in
range(
1
, n_epochs+
1
):
7
8
# keep track of training and validation loss
9
train_loss =
0.0
10
valid_loss =
0.0
11
12
###################
13
#
訓練集的模型 #
14
###################
15
model.train()
16
for
data, target
in
train_loader:
17
# move tensors to GPU if CUDA is available
18
if
train_on_gpu:
19
data, target = data.cuda(), target.cuda()
20
# clear the gradients of all optimized variables
21
optimizer.zero_grad()
22
# forward pass: compute predicted outputs by passing inputs to the model
23
output = model(data)
24
# calculate the batch loss
25
loss = criterion(output, target)
26
# backward pass: compute gradient of the loss with respect to model parameters
27
loss.backward()
28
# perform a single optimization step (parameter update)
29
optimizer.step()
30
# update training loss
31
train_loss += loss.item()*data.size(
0
)
32
33
######################
34
#
驗證集的模型#
35
######################
36
model.eval()
37
for
data, target
in
valid_loader:
38
# move tensors to GPU if CUDA is available
39
if
train_on_gpu:
40
data, target = data.cuda(), target.cuda()
41
# forward pass: compute predicted outputs by passing inputs to the model
42
output = model(data)
43
# calculate the batch loss
44
loss = criterion(output, target)
45
# update average validation loss
46
valid_loss += loss.item()*data.size(
0
)
47
48
#
計算平均損失
49
train_loss = train_loss/len(train_loader.sampler)
50
valid_loss = valid_loss/len(valid_loader.sampler)
51
52
#
顯示訓練集與驗證集的損失函數
53
print(
'Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'
.format(
54
epoch, train_loss, valid_loss))
55
56
#
如果驗證集損失函數減少,就保存模型。
57
if
valid_loss <= valid_loss_min:
58
print(
'Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'
.format(
59
valid_loss_min,
60
valid_loss))
61
torch.save(model.state_dict(),
'model_cifar.pt'
)
62
valid_loss_min = valid_loss
結果:
1
Epoch
: 1
Training
Loss
: 2
.065666
Validation
Loss
: 1
.706993
2
Validation
loss
decreased
(
inf
--
> 1
.706993
).
Saving
model
...
3
Epoch
: 2
Training
Loss
: 1
.609919
Validation
Loss
: 1
.451288
4
Validation
loss
decreased
(1
.706993
--
> 1
.451288
).
Saving
model
...
5
Epoch
: 3
Training
Loss
: 1
.426175
Validation
Loss
: 1
.294594
6
Validation
loss
decreased
(1
.451288
--
> 1
.294594
).
Saving
model
...
7
Epoch
: 4
Training
Loss
: 1
.307891
Validation
Loss
: 1
.182497
8
Validation
loss
decreased
(1
.294594
--
> 1
.182497
).
Saving
model
...
9
Epoch
: 5
Training
Loss
: 1
.200655
Validation
Loss
: 1
.118825
10
Validation
loss
decreased
(1
.182497
--
> 1
.118825
).
Saving
model
...
11
Epoch
: 6
Training
Loss
: 1
.115498
Validation
Loss
: 1
.041203
12
Validation
loss
decreased
(1
.118825
--
> 1
.041203
).
Saving
model
...
13
Epoch
: 7
Training
Loss
: 1
.047874
Validation
Loss
: 1
.020686
14
Validation
loss
decreased
(1
.041203
--
> 1
.020686
).
Saving
model
...
15
Epoch
: 8
Training
Loss
: 0
.991542
Validation
Loss
: 0
.936289
16
Validation
loss
decreased
(1
.020686
--
> 0
.936289
).
Saving
model
...
17
Epoch
: 9
Training
Loss
: 0
.942437
Validation
Loss
: 0
.892730
18
Validation
loss
decreased
(0
.936289
--
> 0
.892730
).
Saving
model
...
19
Epoch
: 10
Training
Loss
: 0
.894279
Validation
Loss
: 0
.875833
20
Validation
loss
decreased
(0
.892730
--
> 0
.875833
).
Saving
model
...
21
Epoch
: 11
Training
Loss
: 0
.859178
Validation
Loss
: 0
.838847
22
Validation
loss
decreased
(0
.875833
--
> 0
.838847
).
Saving
model
...
23
Epoch
: 12
Training
Loss
: 0
.822664
Validation
Loss
: 0
.823634
24
Validation
loss
decreased
(0
.838847
--
> 0
.823634
).
Saving
model
...
25
Epoch
: 13
Training
Loss
: 0
.787049
Validation
Loss
: 0
.802566
26
Validation
loss
decreased
(0
.823634
--
> 0
.802566
).
Saving
model
...
27
Epoch
: 14
Training
Loss
: 0
.749585
Validation
Loss
: 0
.785852
28
Validation
loss
decreased
(0
.802566
--
> 0
.785852
).
Saving
model
...
29
Epoch
: 15
Training
Loss
: 0
.721540
Validation
Loss
: 0
.772729
30
Validation
loss
decreased
(0
.785852
--
> 0
.772729
).
Saving
model
...
31
Epoch
: 16
Training
Loss
: 0
.689508
Validation
Loss
: 0
.768470
32
Validation
loss
decreased
(0
.772729
--
> 0
.768470
).
Saving
model
...
33
Epoch
: 17
Training
Loss
: 0
.662432
Validation
Loss
: 0
.758518
34
Validation
loss
decreased
(0
.768470
--
> 0
.758518
).
Saving
model
...
35
Epoch
: 18
Training
Loss
: 0
.632324
Validation
Loss
: 0
.750859
36
Validation
loss
decreased
(0
.758518
--
> 0
.750859
).
Saving
model
...
37
Epoch
: 19
Training
Loss
: 0
.616094
Validation
Loss
: 0
.729692
38
Validation
loss
decreased
(0
.750859
--
> 0
.729692
).
Saving
model
...
39
Epoch
: 20
Training
Loss
: 0
.588593
Validation
Loss
: 0
.729085
40
Validation
loss
decreased
(0
.729692
--
> 0
.729085
).
Saving
model
...
41
Epoch
: 21
Training
Loss
: 0
.571516
Validation
Loss
: 0
.734009
42
Epoch
: 22
Training
Loss
: 0
.545541
Validation
Loss
: 0
.721433
43
Validation
loss
decreased
(0
.729085
--
> 0
.721433
).
Saving
model
...
44
Epoch
: 23
Training
Loss
: 0
.523696
Validation
Loss
: 0
.720512
45
Validation
loss
decreased
(0
.721433
--
> 0
.720512
).
Saving
model
...
46
Epoch
: 24
Training
Loss
: 0
.508577
Validation
Loss
: 0
.728457
47
Epoch
: 25
Training
Loss
: 0
.483033
Validation
Loss
: 0
.722556
48
Epoch
: 26
Training
Loss
: 0
.469563
Validation
Loss
: 0
.742352
49
Epoch
: 27
Training
Loss
: 0
.449316
Validation
Loss
: 0
.726019
50
Epoch
: 28
Training
Loss
: 0
.442354
Validation
Loss
: 0
.713364
51
Validation
loss
decreased
(0
.720512
--
> 0
.713364
).
Saving
model
...
52
Epoch
: 29
Training
Loss
: 0
.421807
Validation
Loss
: 0
.718615
53
Epoch
: 30
Training
Loss
: 0
.404595
Validation
Loss
: 0
.729914
加載模型
1
model.load_state_dict(torch.load(
'model_cifar.pt'
))
結果:
1
<All keys matched successfully>
測試訓練好的網絡
在測試數據上測試你的訓練模型!一個“好”的結果將是CNN得到大約70%,這些測試圖像的准確性。
1
# track test loss
2
test_loss =
0.0
3
class_correct = list(
0.
for
i
in
range(
10
))
4
class_total = list(
0.
for
i
in
range(
10
))
5
6
model.eval()
7
# iterate over test data
8
for
data, target
in
test_loader:
9
# move tensors to GPU if CUDA is available
10
if
train_on_gpu:
11
data, target = data.cuda(), target.cuda()
12
# forward pass: compute predicted outputs by passing inputs to the model
13
output = model(data)
14
# calculate the batch loss
15
loss = criterion(output, target)
16
# update test loss
17
test_loss += loss.item()*data.size(
0
)
18
# convert output probabilities to predicted class
19
_, pred = torch.max(output,
1
)
20
# compare predictions to true label
21
correct_tensor = pred.eq(target.data.view_as(pred))
22
correct = np.squeeze(correct_tensor.numpy())
if
not
train_on_gpu
else
np.squeeze(correct_tensor.cpu().numpy())
23
# calculate test accuracy for each object class
24
for
i
in
range(batch_size):
25
label = target.data[i]
26
class_correct[label] += correct[i].item()
27
class_total[label] +=
1
28
29
# average test loss
30
test_loss = test_loss/len(test_loader.dataset)
31
print(
'Test Loss: {:.6f}\n'
.format(test_loss))
32
33
for
i
in
range(
10
):
34
if
class_total[i] >
0
:
35
print(
'Test Accuracy of %5s: %2d%% (%2d/%2d)'
% (
36
classes[i],
100
* class_correct[i] / class_total[i],
37
np.sum(class_correct[i]), np.sum(class_total[i])))
38
else
:
39
print(
'Test Accuracy of %5s: N/A (no training examples)'
% (classes[i]))
40
41
print(
'\nTest Accuracy (Overall): %2d%% (%2d/%2d)'
% (
42
100.
* np.sum(class_correct) / np.sum(class_total),
43
np.sum(class_correct), np.sum(class_total)))
結果:
1
Test Loss:
0.708721
2
3
Test Accuracy
of
airplane:
82
% (
826
/
1000
)
4
Test Accuracy
of
automobile:
81
% (
818
/
1000
)
5
Test Accuracy
of
bird:
65
% (
659
/
1000
)
6
Test Accuracy
of
cat:
59
% (
590
/
1000
)
7
Test Accuracy
of
deer:
75
% (
757
/
1000
)
8
Test Accuracy
of
dog:
56
% (
565
/
1000
)
9
Test Accuracy
of
frog:
81
% (
812
/
1000
)
10
Test Accuracy
of
horse:
82
% (
823
/
1000
)
11
Test Accuracy
of
ship:
86
% (
866
/
1000
)
12
Test Accuracy
of
truck:
84
% (
848
/
1000
)
13
14
Test Accuracy (Overall):
75
% (
7564
/
10000
)
顯示測試樣本的結果
1
# obtain one batch of test images
2
dataiter = iter(test_loader)
3
images, labels = dataiter.next()
4
images.numpy()
5
6
# move model inputs to cuda, if GPU available
7
if
train_on_gpu:
8
images = images.cuda()
9
10
# get sample outputs
11
output = model(images)
12
# convert output probabilities to predicted class
13
_, preds_tensor = torch.max(output,
1
)
14
preds = np.squeeze(preds_tensor.numpy())
if
not
train_on_gpu
else
np.squeeze(preds_tensor.cpu().numpy())
15
16
# plot the images in the batch, along with predicted and true labels
17
fig = plt.figure(figsize=(
25
,
4
))
18
for
idx
in
np.arange(
16
):
19
ax = fig.add_subplot(
2
,
16
/
2
, idx+
1
, xticks=[], yticks=[])
20
imshow(images.cpu()[idx])
21
ax.set_title(
"{} ({})"
.format(classes[preds[idx]], classes[labels[idx]]),
22
color=(
"green"
if
preds[idx]==labels[idx].item()
else
"red"
))
結果:
參考資料:
《吳恩達深度學習筆記》
《深度學習入門:基於Python的理論與實現》
https://pytorch.org/docs/stable/nn.html#
https://github.com/udacity/deep-learning-v2-pytorch