作者:如縷清風
本文為博主原創,未經允許,請勿轉載:https://www.cnblogs.com/warren2123/p/15035742.html
一、前言
本文通過PyTorch構建簡單的卷積神經網絡模型,實現對圖像分類入門數據——Fashion-MNIST進行分類。
Fashion-MNIST 是 Zalando 文章圖像的數據集,由 60,000 個示例的訓練集和 10,000 個示例的測試集組成。每個示例都是一個 28x28 灰度圖像,共分為 10 個類別。Fashion-MNIST樣本圖例如下所示:
二、基於PyTorch構建卷積神經網絡模型
由於Fashion-MNIST數據比較簡單,僅有一個通道的灰度圖像,通過疊加幾層卷積層並結合超參數優化,輕松實現91%以上的准確率。本文模型構建分為五個部分:數據讀取及預處理、構建卷積神經網絡模型、定義模型超參數以及評估方法、參數更新、優化。
1、數據讀取及預處理
以下是本文用到的Python庫,同樣采用GPU進行運算加速,若不存在GPU,則會采用CPU進行運算。
import numpy as np import pandas as pd import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import Dataset, DataLoader from pathlib import Path device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
由於文件采用csv格式文件,所以采取Pandas對文件進行讀取。數據有785列,其中label為標簽值。
DATA_PATH = Path('data/') train = pd.read_csv(DATA_PATH / "fashion-mnist_train.csv") train.head()
通過繼承PyTorch的Dataset方法,定義一個處理Fashion-MNIST的類。
class FashionMNISTDataset(Dataset): def __init__(self, csv_file, transform=None): data = pd.read_csv(csv_file) self.X = np.array(data.iloc[:, 1:]).reshape(-1, 1, 28, 28).astype(float) self.Y = np.array(data.iloc[:, 0]) del data self.len = len(self.X) def __len__(self): return self.len def __getitem__(self, idx): item = self.X[idx] label = self.Y[idx] return (item, label)
首先運用定義的FashionMNISTDataset將數據集變換成 28x28 的格式,再用DataLoader的方法讀取數據。
train_dataset = FashionMNISTDataset(csv_file=DATA_PATH / "fashion-mnist_train.csv") test_dataset = FashionMNISTDataset(csv_file=DATA_PATH / "fashion-mnist_test.csv") train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True) test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)
2、構建卷積神經網絡模型
本文的卷積神經網絡模型,采用三層卷積層、兩層池化層、一層全連接層,卷積層都運用批量歸一化的方法以便提升模型泛化能力。
class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 16, kernel_size=(5, 5), padding=2), nn.BatchNorm2d(16), nn.ReLU() ) self.pool1 = nn.MaxPool2d(2) self.layer2 = nn.Sequential( nn.Conv2d(16, 32, kernel_size=(3, 3)), nn.BatchNorm2d(32), nn.ReLU() ) self.layer3 = nn.Sequential( nn.Conv2d(32, 64, kernel_size=(3, 3)), nn.BatchNorm2d(64), nn.ReLU() ) self.pool2 = nn.MaxPool2d(2) self.fc = nn.Linear(5 * 5 * 64, 10) def forward(self, x): out = self.pool1(self.layer1(x)) out = self.pool2(self.layer3(self.layer2(out))) out = out.view(out.size(0), -1) out = self.fc(out) return out
CNN模型的結構如下如所示:
3、定義模型超參數以及評估方法
模型的超參數包括:學習率、訓練次數、每批大小,損失函數運用交叉熵,優化函數采用Adam。
LR = 0.01 EPOCHES = 50 BATCH_SIZE = 256 criterion = nn.CrossEntropyLoss().to(device) optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)
4、參數更新
開始訓練,每一個步長計算兩次損失,並且保存當前狀態的損失值。
losses = [] for epoch in range(EPOCHES): for i, (images, labels) in enumerate(train_loader): images = images.float().to(device) labels = labels.to(device) optimizer.zero_grad() outputs = cnn(images) loss = criterion(outputs, labels) loss.backward() optimizer.step() losses.append(loss.cpu().data.item()) if (i + 1) % 100 == 0: print('Epoch : %d/%d, Iter : %d/%d, Loss : %.4f' % ( epoch + 1, EPOCHES, i + 1, len(train_dataset) // BATCH_SIZE, loss.data.item() ))
訓練輸出如下所示:
Epoch : 1/50, Iter : 100/234, Loss : 0.4174 Epoch : 1/50, Iter : 200/234, Loss : 0.4561 Epoch : 2/50, Iter : 100/234, Loss : 0.3844 Epoch : 2/50, Iter : 200/234, Loss : 0.2597 Epoch : 3/50, Iter : 100/234, Loss : 0.2444 Epoch : 3/50, Iter : 200/234, Loss : 0.3232 Epoch : 4/50, Iter : 100/234, Loss : 0.2470 Epoch : 4/50, Iter : 200/234, Loss : 0.2128 Epoch : 5/50, Iter : 100/234, Loss : 0.2989 Epoch : 5/50, Iter : 200/234, Loss : 0.2139 ...... Epoch : 45/50, Iter : 100/234, Loss : 0.0293 Epoch : 45/50, Iter : 200/234, Loss : 0.0318 Epoch : 46/50, Iter : 100/234, Loss : 0.0179 Epoch : 46/50, Iter : 200/234, Loss : 0.0131 Epoch : 47/50, Iter : 100/234, Loss : 0.0189 Epoch : 47/50, Iter : 200/234, Loss : 0.0704 Epoch : 48/50, Iter : 100/234, Loss : 0.0510 Epoch : 48/50, Iter : 200/234, Loss : 0.0939 Epoch : 49/50, Iter : 100/234, Loss : 0.0441 Epoch : 49/50, Iter : 200/234, Loss : 0.0585 Epoch : 50/50, Iter : 100/234, Loss : 0.0356 Epoch : 50/50, Iter : 200/234, Loss : 0.0246
對訓練誤差進行可視化:
模型在測試集的准確率如下所示:
cnn.eval() correct = 0 total = 0 for images, labels in test_loader: images = images.float().to(device) outputs = cnn(images).cpu() _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum() print('CNN 測試集准確率:%.2f %%' % (100 * correct / total)) # 輸出:CNN 測試集准確率:90.78 %
5、優化
修改學習率和批次
cnn.train() LR = LR / 10 EPOCHES = 20 optimizer = torch.optim.Adam(cnn.parameters(), lr=LR) losses = [] for epoch in range(EPOCHES): for i, (images, labels) in enumerate(train_loader): images = images.float().to(device) labels = labels.to(device) optimizer.zero_grad() outputs = cnn(images) loss = criterion(outputs, labels).cpu() loss.backward() optimizer.step() losses.append(loss.data.item()) if (i + 1) % 100 == 0: print('Epoch : %d/%d, Iter : %d/%d, Loss : %.4f' % ( epoch + 1, EPOCHES, i + 1, len(train_dataset) // BATCH_SIZE, loss.data.item() ))
優化后的訓練輸出如下所示:
Epoch : 1/20, Iter : 100/234, Loss : 0.0042 Epoch : 1/20, Iter : 200/234, Loss : 0.0055 Epoch : 2/20, Iter : 100/234, Loss : 0.0027 Epoch : 2/20, Iter : 200/234, Loss : 0.0016 Epoch : 3/20, Iter : 100/234, Loss : 0.0040 Epoch : 3/20, Iter : 200/234, Loss : 0.0035 Epoch : 4/20, Iter : 100/234, Loss : 0.0010 Epoch : 4/20, Iter : 200/234, Loss : 0.0015 Epoch : 5/20, Iter : 100/234, Loss : 0.0009 Epoch : 5/20, Iter : 200/234, Loss : 0.0013 ...... Epoch : 15/20, Iter : 100/234, Loss : 0.0004 Epoch : 15/20, Iter : 200/234, Loss : 0.0002 Epoch : 16/20, Iter : 100/234, Loss : 0.0005 Epoch : 16/20, Iter : 200/234, Loss : 0.0007 Epoch : 17/20, Iter : 100/234, Loss : 0.0002 Epoch : 17/20, Iter : 200/234, Loss : 0.0003 Epoch : 18/20, Iter : 100/234, Loss : 0.0003 Epoch : 18/20, Iter : 200/234, Loss : 0.0001 Epoch : 19/20, Iter : 100/234, Loss : 0.0009 Epoch : 19/20, Iter : 200/234, Loss : 0.0002 Epoch : 20/20, Iter : 100/234, Loss : 0.0002 Epoch : 20/20, Iter : 200/234, Loss : 0.0001
對優化的訓練誤差進行可視化:
優化后的模型在測試集的准確率如下所示(91.89%):
cnn.eval() correct = 0 total = 0 for images, labels in test_loader: images = images.float().to(device) outputs = cnn(images).cpu() _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum() print('CNN 測試集准確率:%.2f %%' % (100 * correct / total)) # CNN 測試集准確率:91.89 %
三、總結
本文通過構建卷積神經網絡模型,對Fashion-MNIST圖片數據進行分類,優化后的分類效果達到91.89%。當前模型以達到了瓶頸,無法進一步提升准確率,若想提升分類的准確率,可采用更復雜的模型架構,或加深模型的深度。通過細分預測結果,得出准確率在90%以下的類別:T-shirt/top、Pullover、Shirt,其中Shirt的准確率僅為75%。