基於Kaggle的圖像分類(CIFAR-10)
Image Classification (CIFAR-10) on Kaggle
一直在使用Gluon’s data package數據包直接獲得張量格式的圖像數據集。然而,在實際應用中,圖像數據集往往以圖像文件的形式存在。將從原始圖像文件開始,逐步組織、讀取並將文件轉換為張量格式。對CIFAR-10數據集進行了一個實驗。這是計算機視覺領域的一個重要數據集。現在,將應用前面幾節中所學的知識來參加Kaggle競賽,該競賽解決CIFAR-10圖像分類問題。
比賽的網址是https://www.kaggle.com/c/cifar-10
圖1顯示了比賽網頁上的信息。為了提交結果,請先在Kaggle網站注冊一個帳戶。
Fig. 1 CIFAR-10 image classification competition webpage information. The dataset for the competition can be accessed by clicking the “Data” tab.
首先,導入比賽所需的軟件包或模塊。
import collections
from d2l import mxnet as d2l
import math
from mxnet import autograd, gluon, init, npx
from mxnet.gluon import nn
import os
import pandas as pd
import shutil
import time
npx.set_np()
1. Obtaining and Organizing the Dataset
比賽數據分為訓練集和測試集。訓練集包含50000幀圖像。測試集包含30萬幀圖像,其中10000幀圖像用於評分,而其29萬幀包括非評分圖像,以防止手動標記測試集和提交標記結果。兩個數據集中的圖像格式都是PNG,高度和寬度都是32個像素和三個顏色通道(RGB)。圖像覆蓋1010類別:飛機、汽車、鳥、貓、鹿、狗、青蛙、馬、船和卡車。圖中左上角顯示了數據集中飛機、汽車和鳥類的一些圖像。
1.1. Downloading the Dataset
登錄Kaggle后,點擊圖1所示CIFAR-10圖像分類競賽網頁上的“數據”選項卡,點擊“全部下載”按鈕下載數據集。在../data中解壓縮下載的文件,並在其中解壓縮train.7z和test.7z之后,將在以下路徑中找到整個數據集:
- ../data/cifar-10/train/[1-50000].png
- ../data/cifar-10/test/[1-300000].png
- ../data/cifar-10/trainLabels.csv
- ../data/cifar-10/sampleSubmission.csv
這里的“訓練”和“測試”文件夾分別包含訓練和測試圖像,trainLabels.csv有訓練圖像的標簽和sample_submission.csv是提交的樣本。為了便於入門,提供了一個小規模的數據集示例:包含第一個1000幀訓練圖像和55隨機測試圖像。要使用Kaggle競賽的完整數據集,需要將以下demo變量設置為False。
#@save
d2l.DATA_HUB['cifar10_tiny'] = (d2l.DATA_URL + 'kaggle_cifar10_tiny.zip',
'2068874e4b9a9f0fb07ebe0ad2b29754449ccacd')
# If you use the full dataset downloaded for the Kaggle competition, set the
# demo variable to False
demo = True
if demo:
data_dir = d2l.download_extract('cifar10_tiny')
else:
data_dir = '../data/cifar-10/'
1.2. Organizing the Dataset
需要組織數據集來促進模型的訓練和測試。讓首先從csv文件中讀取標簽。以下函數返回一個字典,該字典將不帶擴展名的文件名映射到其標簽。
#@save
def read_csv_labels(fname):
"""Read fname to return a name to label dictionary."""
with open(fname, 'r') as f:
# Skip the file header line (column name)
lines = f.readlines()[1:]
tokens = [l.rstrip().split(',') for l in lines]
return dict(((name, label) for name, label in tokens))
labels = read_csv_labels(os.path.join(data_dir, 'trainLabels.csv'))
print('# training examples:', len(labels))
print('# classes:', len(set(labels.values())))
# training examples: 1000
# classes: 10
接下來,定義reorg_train_valid函數來從原始訓練集中分割驗證集。此函數中的參數valid_ratio是驗證集中的示例數與原始訓練集中的示例數的比率。特別是讓n是具有最少示例的類的圖像數,以及r是比率,那么將使用最大值(⌊nr⌋,1),每個類的圖像作為驗證集。讓以valid_ratio=0.1為例。從最初的訓練開始50000幀圖像,會有45000幀。當調整超參數時,用於訓練並存儲在路徑“train_valid_test/train”中的圖像,而另一個5000幀圖像將作為驗證集存儲在“train_valid_test/train”路徑中。組織好數據后,同一類的圖像將被放在同一個文件夾下,以便以后閱讀。
#@save
def copyfile(filename, target_dir):
"""Copy a file into a target directory."""
d2l.mkdir_if_not_exist(target_dir)
shutil.copy(filename, target_dir)
#@save
def reorg_train_valid(data_dir, labels, valid_ratio):
# The number of examples of the class with the least examples in the
# training dataset
n = collections.Counter(labels.values()).most_common()[-1][1]
# The number of examples per class for the validation set
n_valid_per_label = max(1, math.floor(n * valid_ratio))
label_count = {}
for train_file in os.listdir(os.path.join(data_dir, 'train')):
label = labels[train_file.split('.')[0]]
fname = os.path.join(data_dir, 'train', train_file)
# Copy to train_valid_test/train_valid with a subfolder per class
copyfile(fname, os.path.join(data_dir, 'train_valid_test',
'train_valid', label))
if label not in label_count or label_count[label] < n_valid_per_label:
# Copy to train_valid_test/valid
copyfile(fname, os.path.join(data_dir, 'train_valid_test',
'valid', label))
label_count[label] = label_count.get(label, 0) + 1
else:
# Copy to train_valid_test/train
copyfile(fname, os.path.join(data_dir, 'train_valid_test',
'train', label))
return n_valid_per_label
下面的reorg_test函數用於組織測試集,以便於預測期間的讀數。
#@save
def reorg_test(data_dir):
for test_file in os.listdir(os.path.join(data_dir, 'test')):
copyfile(os.path.join(data_dir, 'test', test_file),
os.path.join(data_dir, 'train_valid_test', 'test',
'unknown'))
使用一個函數來調用先前定義的read_csv_labels、reorg_train_valid和reorg_test函數。
def reorg_cifar10_data(data_dir, valid_ratio):
labels = read_csv_labels(os.path.join(data_dir, 'trainLabels.csv'))
reorg_train_valid(data_dir, labels, valid_ratio)
reorg_test(data_dir)
只將批量大小設置1為演示數據集。在實際訓練和測試過程中,應使用Kaggle競賽的完整數據集,並將批次大小設置為更大的整數,例如128。使用10%作為調整超參數的驗證集。
batch_size = 1 if demo else 128
valid_ratio = 0.1
reorg_cifar10_data(data_dir, valid_ratio)
2. Image Augmentation
為了解決過度擬合的問題,使用圖像增強技術。例如,通過添加transforms.RandomFlipLeftRight(),圖像可以隨機翻轉。還可以使用transforms.Normalize()。下面,將列出其中一些操作,可以根據需要選擇使用或修改這些操作。
transform_train = gluon.data.vision.transforms.Compose([
# Magnify the image to a square of 40 pixels in both height and width
gluon.data.vision.transforms.Resize(40),
# Randomly crop a square image of 40 pixels in both height and width to
# produce a small square of 0.64 to 1 times the area of the original
# image, and then shrink it to a square of 32 pixels in both height and
# width
gluon.data.vision.transforms.RandomResizedCrop(32, scale=(0.64, 1.0),
ratio=(1.0, 1.0)),
gluon.data.vision.transforms.RandomFlipLeftRight(),
gluon.data.vision.transforms.ToTensor(),
# Normalize each channel of the image
gluon.data.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],
[0.2023, 0.1994, 0.2010])])
為了保證測試過程中輸出的確定性,只對圖像進行歸一化處理。
transform_test = gluon.data.vision.transforms.Compose([
gluon.data.vision.transforms.ToTensor(),
gluon.data.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],
[0.2023, 0.1994, 0.2010])])
3. Reading the Dataset
接下來,可以創建ImageFolderDataset實例來讀取包含原始圖像文件的有組織的數據集,其中每個示例都包含圖像和標簽。
train_ds, valid_ds, train_valid_ds, test_ds = [
gluon.data.vision.ImageFolderDataset(
os.path.join(data_dir, 'train_valid_test', folder))
for folder in ['train', 'valid', 'train_valid', 'test']]
在DataLoader中指定定義的圖像增強操作。在訓練過程中,只使用驗證集來評估模型,所以需要確保輸出的確定性。在預測過程中,將在組合訓練集和驗證集上訓練模型,以充分利用所有標記數據。
train_iter, train_valid_iter = [gluon.data.DataLoader(
dataset.transform_first(transform_train), batch_size, shuffle=True,
last_batch='keep') for dataset in (train_ds, train_valid_ds)]
valid_iter, test_iter = [gluon.data.DataLoader(
dataset.transform_first(transform_test), batch_size, shuffle=False,
last_batch='keep') for dataset in (valid_ds, test_ds)]
4. Defining the Model
基於HybridBlock類構建剩余塊,這樣做是為了提高執行效率。
class Residual(nn.HybridBlock):
def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):
super(Residual, self).__init__(**kwargs)
self.conv1 = nn.Conv2D(num_channels, kernel_size=3, padding=1,
strides=strides)
self.conv2 = nn.Conv2D(num_channels, kernel_size=3, padding=1)
if use_1x1conv:
self.conv3 = nn.Conv2D(num_channels, kernel_size=1,
strides=strides)
else:
self.conv3 = None
self.bn1 = nn.BatchNorm()
self.bn2 = nn.BatchNorm()
def hybrid_forward(self, F, X):
Y = F.npx.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
if self.conv3:
X = self.conv3(X)
return F.npx.relu(Y + X)
定義ResNet-18模型。
def resnet18(num_classes):
net = nn.HybridSequential()
net.add(nn.Conv2D(64, kernel_size=3, strides=1, padding=1),
nn.BatchNorm(), nn.Activation('relu'))
def resnet_block(num_channels, num_residuals, first_block=False):
blk = nn.HybridSequential()
for i in range(num_residuals):
if i == 0 and not first_block:
blk.add(Residual(num_channels, use_1x1conv=True, strides=2))
else:
blk.add(Residual(num_channels))
return blk
net.add(resnet_block(64, 2, first_block=True),
resnet_block(128, 2),
resnet_block(256, 2),
resnet_block(512, 2))
net.add(nn.GlobalAvgPool2D(), nn.Dense(num_classes))
return net
CIFAR-10圖像分類挑戰賽使用10個類別。在訓練開始之前,將對模型執行Xavier隨機初始化。
def get_net(ctx):
num_classes = 10
net = resnet18(num_classes)
net.initialize(ctx=ctx, init=init.Xavier())
return net
loss = gluon.loss.SoftmaxCrossEntropyLoss()
5. Defining the Training Functions
將根據模型在驗證集上的性能來選擇模型並調整超參數。其次,定義了模型訓練函數訓練。記錄了每個時代的訓練時間,這有助於比較不同模型的時間成本。
def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,
lr_decay):
trainer = gluon.Trainer(net.collect_params(), 'sgd',
{'learning_rate': lr, 'momentum': 0.9, 'wd': wd})
for epoch in range(num_epochs):
train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
if epoch > 0 and epoch % lr_period == 0:
trainer.set_learning_rate(trainer.learning_rate * lr_decay)
for X, y in train_iter:
y = y.astype('float32').as_in_ctx(ctx)
with autograd.record():
y_hat = net(X.as_in_ctx(ctx))
l = loss(y_hat, y).sum()
l.backward()
trainer.step(batch_size)
train_l_sum += float(l)
train_acc_sum += float((y_hat.argmax(axis=1) == y).sum())
n += y.size
time_s = "time %.2f sec" % (time.time() - start)
if valid_iter is not None:
valid_acc = d2l.evaluate_accuracy_gpu(net, valid_iter)
epoch_s = ("epoch %d, loss %f, train acc %f, valid acc %f, "
% (epoch + 1, train_l_sum / n, train_acc_sum / n,
valid_acc))
else:
epoch_s = ("epoch %d, loss %f, train acc %f, " %
(epoch + 1, train_l_sum / n, train_acc_sum / n))
print(epoch_s + time_s + ', lr ' + str(trainer.learning_rate))
6. Training and Validating the Model
現在可以對模型進行驗證。可以調整以下超參數。例如,可以增加紀元的數量。由於lr_period和lr_decay分別設置為80和0.1,因此每80個周期后,優化算法的學習速率將乘以0.1。為了簡單起見,在這里只訓練了一個時代。
ctx, num_epochs, lr, wd = d2l.try_gpu(), 1, 0.1, 5e-4
lr_period, lr_decay, net = 80, 0.1, get_net(ctx)
net.hybridize()
train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,
lr_decay)
epoch 1, loss 2.859060, train acc 0.100000, valid acc 0.100000, time 9.51 sec, lr 0.1
7. Classifying the Testing Set and Submitting Results on Kaggle
在獲得滿意的模型設計和超參數后,使用所有訓練數據集(包括驗證集)對模型進行再訓練並對測試集進行分類。
net, preds = get_net(ctx), []
net.hybridize()
train(net, train_valid_iter, None, num_epochs, lr, wd, ctx, lr_period,
lr_decay)
for X, _ in test_iter:
y_hat = net(X.as_in_ctx(ctx))
preds.extend(y_hat.argmax(axis=1).astype(int).asnumpy())
sorted_ids = list(range(1, len(test_ds) + 1))
sorted_ids.sort(key=lambda x: str(x))
df = pd.DataFrame({'id': sorted_ids, 'label': preds})
df['label'] = df['label'].apply(lambda x: train_valid_ds.synsets[x])
df.to_csv('submission.csv', index=False)
epoch 1, loss 2.873863, train acc 0.106000, time 9.55 sec, lr 0.1
執行上述代碼后,將得到一個“submission.csv “文件。此文件的格式符合Kaggle競賽要求。
8. Summary¶
- We can create an ImageFolderDataset instance to read the dataset containing the original image files.
- We can use convolutional neural networks, image augmentation, and hybrid programming to take part in an image classification competition.