基於Kaggle的圖像分類(CIFAR-10)


基於Kaggle的圖像分類(CIFAR-10)

Image Classification (CIFAR-10) on Kaggle

一直在使用Gluon’s data package數據包直接獲得張量格式的圖像數據集。然而,在實際應用中,圖像數據集往往以圖像文件的形式存在。將從原始圖像文件開始,逐步組織、讀取並將文件轉換為張量格式。對CIFAR-10數據集進行了一個實驗。這是計算機視覺領域的一個重要數據集。現在,將應用前面幾節中所學的知識來參加Kaggle競賽,該競賽解決CIFAR-10圖像分類問題。

比賽的網址是https://www.kaggle.com/c/cifar-10             

圖1顯示了比賽網頁上的信息。為了提交結果,請先在Kaggle網站注冊一個帳戶。

 

Fig. 1  CIFAR-10 image classification competition webpage information. The dataset for the competition can be accessed by clicking the “Data” tab.

首先,導入比賽所需的軟件包或模塊。

import collections

from d2l import mxnet as d2l

import math

from mxnet import autograd, gluon, init, npx

from mxnet.gluon import nn

import os

import pandas as pd

import shutil

import time

 

npx.set_np()

1. Obtaining and Organizing the Dataset

比賽數據分為訓練集和測試集。訓練集包含50000幀圖像。測試集包含30萬幀圖像,其中10000幀圖像用於評分,而其29萬幀包括非評分圖像,以防止手動標記測試集和提交標記結果。兩個數據集中的圖像格式都是PNG,高度和寬度都是32個像素和三個顏色通道(RGB)。圖像覆蓋1010類別:飛機、汽車、鳥、貓、鹿、狗、青蛙、馬、船和卡車。圖中左上角顯示了數據集中飛機、汽車和鳥類的一些圖像。

1.1. Downloading the Dataset

登錄Kaggle后,點擊圖1所示CIFAR-10圖像分類競賽網頁上的“數據”選項卡,點擊“全部下載”按鈕下載數據集。在../data中解壓縮下載的文件,並在其中解壓縮train.7z和test.7z之后,將在以下路徑中找到整個數據集:

  • ../data/cifar-10/train/[1-50000].png
  • ../data/cifar-10/test/[1-300000].png
  • ../data/cifar-10/trainLabels.csv
  • ../data/cifar-10/sampleSubmission.csv

這里的“訓練”和“測試”文件夾分別包含訓練和測試圖像,trainLabels.csv有訓練圖像的標簽和sample_submission.csv是提交的樣本。為了便於入門,提供了一個小規模的數據集示例:包含第一個1000幀訓練圖像和55隨機測試圖像。要使用Kaggle競賽的完整數據集,需要將以下demo變量設置為False。

#@save

d2l.DATA_HUB['cifar10_tiny'] = (d2l.DATA_URL + 'kaggle_cifar10_tiny.zip',

                                '2068874e4b9a9f0fb07ebe0ad2b29754449ccacd')

 

# If you use the full dataset downloaded for the Kaggle competition, set the

# demo variable to False

demo = True

 

if demo:

    data_dir = d2l.download_extract('cifar10_tiny')

else:

data_dir = '../data/cifar-10/'

1.2. Organizing the Dataset

需要組織數據集來促進模型的訓練和測試。讓首先從csv文件中讀取標簽。以下函數返回一個字典,該字典將不帶擴展名的文件名映射到其標簽。

#@save

def read_csv_labels(fname):

    """Read fname to return a name to label dictionary."""

    with open(fname, 'r') as f:

        # Skip the file header line (column name)

        lines = f.readlines()[1:]

    tokens = [l.rstrip().split(',') for l in lines]

    return dict(((name, label) for name, label in tokens))

labels = read_csv_labels(os.path.join(data_dir, 'trainLabels.csv'))

print('# training examples:', len(labels))

print('# classes:', len(set(labels.values())))

# training examples: 1000

# classes: 10

接下來,定義reorg_train_valid函數來從原始訓練集中分割驗證集。此函數中的參數valid_ratio是驗證集中的示例數與原始訓練集中的示例數的比率。特別是讓n是具有最少示例的類的圖像數,以及r是比率,那么將使用最大值(⌊nr⌋,1),每個類的圖像作為驗證集。讓以valid_ratio=0.1為例。從最初的訓練開始50000幀圖像,會有45000幀。當調整超參數時,用於訓練並存儲在路徑“train_valid_test/train”中的圖像,而另一個5000幀圖像將作為驗證集存儲在“train_valid_test/train”路徑中。組織好數據后,同一類的圖像將被放在同一個文件夾下,以便以后閱讀。

#@save

def copyfile(filename, target_dir):

    """Copy a file into a target directory."""

    d2l.mkdir_if_not_exist(target_dir)

    shutil.copy(filename, target_dir)

 

#@save

def reorg_train_valid(data_dir, labels, valid_ratio):

    # The number of examples of the class with the least examples in the

    # training dataset

    n = collections.Counter(labels.values()).most_common()[-1][1]

    # The number of examples per class for the validation set

    n_valid_per_label = max(1, math.floor(n * valid_ratio))

    label_count = {}

    for train_file in os.listdir(os.path.join(data_dir, 'train')):

        label = labels[train_file.split('.')[0]]

        fname = os.path.join(data_dir, 'train', train_file)

        # Copy to train_valid_test/train_valid with a subfolder per class

        copyfile(fname, os.path.join(data_dir, 'train_valid_test',

                                     'train_valid', label))

        if label not in label_count or label_count[label] < n_valid_per_label:

            # Copy to train_valid_test/valid

            copyfile(fname, os.path.join(data_dir, 'train_valid_test',

                                         'valid', label))

            label_count[label] = label_count.get(label, 0) + 1

        else:

            # Copy to train_valid_test/train

            copyfile(fname, os.path.join(data_dir, 'train_valid_test',

                                         'train', label))

    return n_valid_per_label

下面的reorg_test函數用於組織測試集,以便於預測期間的讀數。

#@save

def reorg_test(data_dir):

    for test_file in os.listdir(os.path.join(data_dir, 'test')):

        copyfile(os.path.join(data_dir, 'test', test_file),

                 os.path.join(data_dir, 'train_valid_test', 'test',

                              'unknown'))

使用一個函數來調用先前定義的read_csv_labels、reorg_train_valid和reorg_test函數。

def reorg_cifar10_data(data_dir, valid_ratio):

    labels = read_csv_labels(os.path.join(data_dir, 'trainLabels.csv'))

    reorg_train_valid(data_dir, labels, valid_ratio)

reorg_test(data_dir)

只將批量大小設置1為演示數據集。在實際訓練和測試過程中,應使用Kaggle競賽的完整數據集,並將批次大小設置為更大的整數,例如128。使用10%作為調整超參數的驗證集。

batch_size = 1 if demo else 128

valid_ratio = 0.1

reorg_cifar10_data(data_dir, valid_ratio)

2. Image Augmentation

為了解決過度擬合的問題,使用圖像增強技術。例如,通過添加transforms.RandomFlipLeftRight(),圖像可以隨機翻轉。還可以使用transforms.Normalize()。下面,將列出其中一些操作,可以根據需要選擇使用或修改這些操作。

transform_train = gluon.data.vision.transforms.Compose([

    # Magnify the image to a square of 40 pixels in both height and width

    gluon.data.vision.transforms.Resize(40),

    # Randomly crop a square image of 40 pixels in both height and width to

    # produce a small square of 0.64 to 1 times the area of the original

    # image, and then shrink it to a square of 32 pixels in both height and

    # width

    gluon.data.vision.transforms.RandomResizedCrop(32, scale=(0.64, 1.0),

                                                   ratio=(1.0, 1.0)),

    gluon.data.vision.transforms.RandomFlipLeftRight(),

    gluon.data.vision.transforms.ToTensor(),

    # Normalize each channel of the image

    gluon.data.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],

                                           [0.2023, 0.1994, 0.2010])])

為了保證測試過程中輸出的確定性,只對圖像進行歸一化處理。

transform_test = gluon.data.vision.transforms.Compose([

    gluon.data.vision.transforms.ToTensor(),

    gluon.data.vision.transforms.Normalize([0.4914, 0.4822, 0.4465],

                                           [0.2023, 0.1994, 0.2010])])

3. Reading the Dataset

接下來,可以創建ImageFolderDataset實例來讀取包含原始圖像文件的有組織的數據集,其中每個示例都包含圖像和標簽。

train_ds, valid_ds, train_valid_ds, test_ds = [

    gluon.data.vision.ImageFolderDataset(

        os.path.join(data_dir, 'train_valid_test', folder))

for folder in ['train', 'valid', 'train_valid', 'test']]

在DataLoader中指定定義的圖像增強操作。在訓練過程中,只使用驗證集來評估模型,所以需要確保輸出的確定性。在預測過程中,將在組合訓練集和驗證集上訓練模型,以充分利用所有標記數據。

train_iter, train_valid_iter = [gluon.data.DataLoader(

    dataset.transform_first(transform_train), batch_size, shuffle=True,

    last_batch='keep') for dataset in (train_ds, train_valid_ds)]

 

valid_iter, test_iter = [gluon.data.DataLoader(

    dataset.transform_first(transform_test), batch_size, shuffle=False,

    last_batch='keep') for dataset in (valid_ds, test_ds)]

4. Defining the Model

基於HybridBlock類構建剩余塊,這樣做是為了提高執行效率。

class Residual(nn.HybridBlock):

    def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):

        super(Residual, self).__init__(**kwargs)

        self.conv1 = nn.Conv2D(num_channels, kernel_size=3, padding=1,

                               strides=strides)

        self.conv2 = nn.Conv2D(num_channels, kernel_size=3, padding=1)

        if use_1x1conv:

            self.conv3 = nn.Conv2D(num_channels, kernel_size=1,

                                   strides=strides)

        else:

            self.conv3 = None

        self.bn1 = nn.BatchNorm()

        self.bn2 = nn.BatchNorm()

 

    def hybrid_forward(self, F, X):

        Y = F.npx.relu(self.bn1(self.conv1(X)))

        Y = self.bn2(self.conv2(Y))

        if self.conv3:

            X = self.conv3(X)

        return F.npx.relu(Y + X)

定義ResNet-18模型。

def resnet18(num_classes):

    net = nn.HybridSequential()

    net.add(nn.Conv2D(64, kernel_size=3, strides=1, padding=1),

            nn.BatchNorm(), nn.Activation('relu'))

 

    def resnet_block(num_channels, num_residuals, first_block=False):

        blk = nn.HybridSequential()

        for i in range(num_residuals):

            if i == 0 and not first_block:

                blk.add(Residual(num_channels, use_1x1conv=True, strides=2))

            else:

                blk.add(Residual(num_channels))

        return blk

 

    net.add(resnet_block(64, 2, first_block=True),

            resnet_block(128, 2),

            resnet_block(256, 2),

            resnet_block(512, 2))

    net.add(nn.GlobalAvgPool2D(), nn.Dense(num_classes))

    return net

CIFAR-10圖像分類挑戰賽使用10個類別。在訓練開始之前,將對模型執行Xavier隨機初始化。

def get_net(ctx):

    num_classes = 10

    net = resnet18(num_classes)

    net.initialize(ctx=ctx, init=init.Xavier())

    return net

 

loss = gluon.loss.SoftmaxCrossEntropyLoss()

5. Defining the Training Functions

將根據模型在驗證集上的性能來選擇模型並調整超參數。其次,定義了模型訓練函數訓練。記錄了每個時代的訓練時間,這有助於比較不同模型的時間成本。

def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,

          lr_decay):

    trainer = gluon.Trainer(net.collect_params(), 'sgd',

                            {'learning_rate': lr, 'momentum': 0.9, 'wd': wd})

    for epoch in range(num_epochs):

        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()

        if epoch > 0 and epoch % lr_period == 0:

            trainer.set_learning_rate(trainer.learning_rate * lr_decay)

        for X, y in train_iter:

            y = y.astype('float32').as_in_ctx(ctx)

            with autograd.record():

                y_hat = net(X.as_in_ctx(ctx))

                l = loss(y_hat, y).sum()

            l.backward()

            trainer.step(batch_size)

            train_l_sum += float(l)

            train_acc_sum += float((y_hat.argmax(axis=1) == y).sum())

            n += y.size

        time_s = "time %.2f sec" % (time.time() - start)

        if valid_iter is not None:

            valid_acc = d2l.evaluate_accuracy_gpu(net, valid_iter)

            epoch_s = ("epoch %d, loss %f, train acc %f, valid acc %f, "

                       % (epoch + 1, train_l_sum / n, train_acc_sum / n,

                          valid_acc))

        else:

            epoch_s = ("epoch %d, loss %f, train acc %f, " %

                       (epoch + 1, train_l_sum / n, train_acc_sum / n))

        print(epoch_s + time_s + ', lr ' + str(trainer.learning_rate))

6. Training and Validating the Model

現在可以對模型進行驗證。可以調整以下超參數。例如,可以增加紀元的數量。由於lr_period和lr_decay分別設置為80和0.1,因此每80個周期后,優化算法的學習速率將乘以0.1。為了簡單起見,在這里只訓練了一個時代。

ctx, num_epochs, lr, wd = d2l.try_gpu(), 1, 0.1, 5e-4

lr_period, lr_decay, net = 80, 0.1, get_net(ctx)

net.hybridize()

train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,

      lr_decay)

epoch 1, loss 2.859060, train acc 0.100000, valid acc 0.100000, time 9.51 sec, lr 0.1

7. Classifying the Testing Set and Submitting Results on Kaggle

在獲得滿意的模型設計和超參數后,使用所有訓練數據集(包括驗證集)對模型進行再訓練並對測試集進行分類。

net, preds = get_net(ctx), []

net.hybridize()

train(net, train_valid_iter, None, num_epochs, lr, wd, ctx, lr_period,

      lr_decay)

 

for X, _ in test_iter:

    y_hat = net(X.as_in_ctx(ctx))

    preds.extend(y_hat.argmax(axis=1).astype(int).asnumpy())

sorted_ids = list(range(1, len(test_ds) + 1))

sorted_ids.sort(key=lambda x: str(x))

df = pd.DataFrame({'id': sorted_ids, 'label': preds})

df['label'] = df['label'].apply(lambda x: train_valid_ds.synsets[x])

df.to_csv('submission.csv', index=False)

epoch 1, loss 2.873863, train acc 0.106000, time 9.55 sec, lr 0.1

執行上述代碼后,將得到一個“submission.csv “文件。此文件的格式符合Kaggle競賽要求。

8. Summary

  • We can create an ImageFolderDataset instance to read the dataset containing the original image files.
  • We can use convolutional neural networks, image augmentation, and hybrid programming to take part in an image classification competition.

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM