Keras入門(五)搭建ResNet對CIFAR-10進行圖像分類


  本文將會介紹如何利用Keras來搭建著名的ResNet神經網絡模型,在CIFAR-10數據集進行圖像分類。

數據集介紹

  CIFAR-10數據集是已經標注好的圖像數據集,由Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton三人收集,其訪問網址為:https://www.cs.toronto.edu/~kriz/cifar.html
  CIFAR-10數據集包含60000張尺寸為32x32的彩色圖片,共分成10個分類(類別之間互相獨立),每個類別一共6000張圖片。該數據集划分為訓練集和測試集,其中訓練集5000張圖片,測試集10000張圖片。
  該數據集分為5個訓練批次和1個測試批次,每個批次一共10000張圖片。測試批次包含從每個分類中隨機選取的1000張圖片。訓練批次包含剩下的圖片,但是每個訓練批次的某些類別的圖片會比其他類別多。
  下圖為從每個類別中選取的10張示例圖片:
每個類別的示例圖片
  本文中選用的CIFAR-10數據集下載網址為:https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz,文件夾內容如下:
CIFAR-10數據集Python版本
  我們嘗試着用Python程序讀取里面的圖片(圖片可視化),Python程序代碼如下:

# -*- coding: utf-8 -*-
import cv2
import pickle

# 讀取文件
fpath = 'cifar-10-batches-py/data_batch_1'
with open(fpath, 'rb') as f:
    d = pickle.load(f, encoding='bytes')

data = d[b'data']
labels = d[b'labels']
data = data.reshape(data.shape[0], 3, 32, 32).transpose(0, 2, 3, 1)

# 保存第image_no張圖片
strings=['airplane', 'automobile', 'bird', 'cat', 'deer',
         'dog', 'frog', 'horse', 'ship', 'truck']
image_no = 1000
label = strings[labels[image_no]]
image = data[image_no,:,:,:]
cv2.imwrite('%s.jpg' % label, image)

運行結果如下:
保存后的圖片
圖片雖然比較模糊,但還是可以看出這是一輛車,屬於truck類別。

ResNet模型

  圖像分類中的經典模型為CNN,但CNN隨着層數的增加,顯示出退化問題,即深層次的網絡反而不如稍淺層次的網絡性能;這並非是過擬合導致的,因為在訓練集上就顯示出退化差距。而ResNet能較好地解決這個問題。
  ResNet全名Residual Network,中文名為殘差神經網絡,曾獲得2015年ImageNet的冠軍。ResNet的主要思想在於殘差塊,Kaiming He等設計了一種skip connection(或者shortcut connections)結構,使得網絡具有更強的identity mapping(恆等映射)的能力,從而拓展了網絡的深度,同時也提升了網絡的性能。殘差塊的結構如下:
殘差塊結構
F(x)=H(x)−x,x為淺層的輸出,H(x)為深層的輸出,F(x)為夾在二者中間的的兩層代表的變換,當淺層的x代表的特征已經足夠成熟,如果任何對於特征x的改變都會讓loss變大的話,F(x)會自動趨向於學習成為0,x則從恆等映射的路徑繼續傳遞。這樣就在不增加計算成本的情況下實現了一開始的目的:在前向過程中,當淺層的輸出已經足夠成熟(optimal),讓深層網絡后面的層能夠實現恆等映射的作用。
  示例的殘差塊如下圖:
示例殘差塊
左邊針對的是ResNet34淺層網絡,右邊針對的是ResNet50/101/152深層網絡,右邊這個又被叫做 bottleneck。bottleneck 很好地減少了參數數量。
  以上是關於ResNet的一些簡單介紹,更多細節有待於研究。

模型訓練

  我們利用Keras官方網站給出的ResNet模型對CIFAR-10進行圖片分類。
  項目結構如下圖:
項目結構
  其中load_data.py腳本將數據集導入進來,分為訓練集和測試集,完整代碼如下:

# -*- coding: utf-8 -*-
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras.models import Model
import numpy as np
import os

# 使用GPU,自己根據機器配置調整,默認不開啟
# os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7,8"

from load_data import load_data

# Training parameters
batch_size = 32
epochs = 100
num_classes = 10

# Subtracting pixel mean improves accuracy
subtract_pixel_mean = True

n = 3

# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 1

# Computed depth from supplied model parameter n
depth = n * 6 + 2

# Model name, depth and version
model_type = 'ResNet%dv%d' % (depth, version)

# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = load_data()
print('load data successfully!')

# Input image dimensions.
input_shape = x_train.shape[1:]

# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# If subtract pixel mean is enabled
if subtract_pixel_mean:
    x_train_mean = np.mean(x_train, axis=0)
    x_train -= x_train_mean
    x_test -= x_train_mean

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('Begin model training...')


# Learning Rate Schedule
def lr_schedule(epoch):
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr


# resnet layer
def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):

    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x


def resnet_v1(input_shape, depth, num_classes=10):
    # ResNet Version 1 Model builder [a]
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack > 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack > 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2

    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model


model = resnet_v1(input_shape=input_shape, depth=depth, num_classes=num_classes)
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=lr_schedule(0)),
              metrics=['accuracy'])
model.summary()
print(model_type)

# Prepare model model saving directory.
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'garbage_%s_model.{epoch:03d}.h5' % model_type
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name)

# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath=filepath,
                             monitor='val_acc',
                             verbose=1,
                             save_best_only=True)

lr_scheduler = LearningRateScheduler(lr_schedule)

lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

callbacks = [checkpoint, lr_reducer, lr_scheduler]

# Run training, with data augmentation.
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
        # set input mean to 0 over the dataset
        featurewise_center=False,
        # set each sample mean to 0
        samplewise_center=False,
        # divide inputs by std of dataset
        featurewise_std_normalization=False,
        # divide each input by its std
        samplewise_std_normalization=False,
        # apply ZCA whitening
        zca_whitening=False,
        # epsilon for ZCA whitening
        zca_epsilon=1e-06,
        # randomly rotate images in the range (deg 0 to 180)
        rotation_range=0,
        # randomly shift images horizontally
        width_shift_range=0.1,
        # randomly shift images vertically
        height_shift_range=0.1,
        # set range for random shear
        shear_range=0.,
        # set range for random zoom
        zoom_range=0.,
        # set range for random channel shifts
        channel_shift_range=0.,
        # set mode for filling points outside the input boundaries
        fill_mode='nearest',
        # value used for fill_mode = "constant"
        cval=0.,
        # randomly flip images
        horizontal_flip=True,
        # randomly flip images
        vertical_flip=False,
        # set rescaling factor (applied before any other transformation)
        rescale=None,
        # set function that will be applied on each input
        preprocessing_function=None,
        # image data format, either "channels_first" or "channels_last"
        data_format=None,
        # fraction of images reserved for validation (strictly between 0 and 1)
        validation_split=0.0)

# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)

# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    steps_per_epoch=len(x_train) // batch_size,
                    validation_data=(x_test, y_test),
                    epochs=epochs, verbose=1, workers=4,
                    callbacks=callbacks)

# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

輸出的模型結構如下:
模型結構
  在GPU上進行模型訓練,訓練結果如下:

Test loss: 0.4439272038936615
Test accuracy: 0.9128

訓練過程輸出

總結

  本項目已經開源,Github地址為:https://github.com/percent4/resnet_4_cifar10
  感謝大家閱讀,有問題請批評指正~


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM