前言
本文與前文對手寫數字識別分類基本類似的,同樣圖像作為輸入,類別作為輸出。這里不同的是,不僅僅是使用簡單的卷積神經網絡加上全連接層的模型。卷積神經網絡大火以來,發展出來許多經典的卷積神經網絡模型,包括VGG、ResNet、AlexNet等等。下面將針對CIFAR-10數據集,對圖像進行分類。
1、CIFAR-10數據集、Reader創建
CIFAR-10數據集分為5個batch的訓練集和1個batch的測試集,每個batch包含10,000張圖片。每張圖像尺寸為32*32的RGB圖像,且包含有標簽。一共有10個標簽:airplane、automobile、bird、cat、deer、dog、frog、horse、ship、truck十個類別。
我在CIFAR-10網站中下載的是[CIFAR-10 python version](http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)。數據集完成后,解壓得到上述六個文件。上述六個文件都是字典文件,使用cPickle模塊即可讀入。字典中‘data’需要重新定義維度為1000*32*32*3,維度分別代表[N H W C],即10,000張32*32尺寸的三通道(RGB)圖像,再經過轉換成為paddlepaddle讀取的[N C H W ]維度形式;而字典‘labels’為10000個標簽。如此一來,可以建立讀取CIFAR-10的reader(與官方例程不同),如下:

def reader_creator(ROOT,istrain=True,cycle=False): def load_CIFAR_batch(filename): """ load single batch of cifar """ with open(filename,'rb') as f: datadict = Pickle.load(f) X = datadict['data'] Y = datadict['labels'] """ (N C H W) transpose to (N H W C) """ X = X.reshape(10000,3,32,32).transpose(0,2,3,1).astype('float') Y = np.array(Y) return X,Y def reader(): while True: if istrain: for b in range(1,6): f = os.path.join(ROOT,'data_batch_%d'%(b)) X,Y = load_CIFAR_batch(f) length = X.shape[0] for i in range(length): yield X[i],Y[i] if not cycle: break else: f = os.path.join(ROOT,'test_batch') X,Y = load_CIFAR_batch(f) length = X.shape[0] for i in range(length): yield X[i],Y[i] if not cycle: break return reader
2、VGG網絡
VGG網絡采用“減小卷積核大小,增加卷積核數量”的思想改造而成,這里直接采用paddlepaddle例程中的VGG網絡了,值得提醒的是paddlepaddle中直接有函數img_conv_group提供卷積、池化、dropout一組操作,所以根據VGG的模型,前面卷積層可以划分為5組,然后再經過3層的全連接層得到結果。
PaddlePaddle例程中根據上圖D網絡,加入dorpout:
def vgg_bn_drop(input): def conv_block(ipt, num_filter, groups, dropouts): return fluid.nets.img_conv_group( input=ipt, #一組的卷積層的卷積核總數,組成list[num_filter num_filter ...] conv_num_filter=[num_filter] * groups, conv_filter_size=3, conv_act='relu', conv_with_batchnorm=True, #每組卷積層各層的droput概率 conv_batchnorm_drop_rate=dropouts, pool_size=2, pool_stride=2, pool_type='max') conv1 = conv_block(input, 64, 2, [0.3, 0]) #[0.3 0]即為第一組兩層的dorpout概率,下同 conv2 = conv_block(conv1, 128, 2, [0.4, 0]) conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0]) conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0]) conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0]) drop = fluid.layers.dropout(x=conv5, dropout_prob=0.5) fc1 = fluid.layers.fc(input=drop, size=512, act=None) bn = fluid.layers.batch_norm(input=fc1, act='relu') drop2 = fluid.layers.dropout(x=bn, dropout_prob=0.5) fc2 = fluid.layers.fc(input=drop2, size=512, act=None) predict = fluid.layers.fc(input=fc2, size=10, act='softmax') return predict
3、訓練
訓練程序與上一節例程一樣,同樣是選取交叉熵作為損失函數,不多累贅講述。

def train_network(): predict = inference_network() label = fluid.layers.data(name='label',shape=[1],dtype='int64') cost = fluid.layers.cross_entropy(input=predict,label=label) avg_cost = fluid.layers.mean(cost) accuracy = fluid.layers.accuracy(input=predict,label=label) return [avg_cost,accuracy] def optimizer_program(): return fluid.optimizer.Adam(learning_rate=0.001) def train(data_path,save_path): BATCH_SIZE = 128 EPOCH_NUM = 2 train_reader = paddle.batch( paddle.reader.shuffle(reader_creator(data_path),buf_size=50000), batch_size = BATCH_SIZE) test_reader = paddle.batch( reader_creator(data_path,False), batch_size=BATCH_SIZE) def event_handler(event): if isinstance(event, fluid.EndStepEvent): if event.step % 100 == 0: print("\nPass %d, Epoch %d, Cost %f, Acc %f" % (event.step, event.epoch, event.metrics[0], event.metrics[1])) else: sys.stdout.write('.') sys.stdout.flush() if isinstance(event, fluid.EndEpochEvent): avg_cost, accuracy = trainer.test( reader=test_reader, feed_order=['image', 'label']) print('\nTest with Pass {0}, Loss {1:2.2}, Acc {2:2.2}'.format( event.epoch, avg_cost, accuracy)) if save_path is not None: trainer.save_params(save_path) place = fluid.CUDAPlace(0) trainer = fluid.Trainer( train_func=train_network, optimizer_func=optimizer_program, place=place) trainer.train( reader=train_reader, num_epochs=EPOCH_NUM, event_handler=event_handler, feed_order=['image', 'label'])
4、測試接口
測試接口也類似,需要特別注意的是圖像維度要改為[N C H W]的順序!

def infer(params_dir): place = fluid.CUDAPlace(0) inferencer = fluid.Inferencer( infer_func=inference_network, param_path=params_dir, place=place) # Prepare testing data. from PIL import Image import numpy as np import os def load_image(file): im = Image.open(file) im = im.resize((32, 32), Image.ANTIALIAS) im = np.array(im).astype(np.float32) """transpose [H W C] to [C H W]""" im = im.transpose((2, 0, 1)) im = im / 255.0 # Add one dimension, [N C H W] N=1 im = np.expand_dims(im, axis=0) return im cur_dir = os.path.dirname(os.path.realpath(__file__)) img = load_image(cur_dir + '/dog.png') # inference results = inferencer.infer({'image': img}) print(results) lab = np.argsort(results) # probs and lab are the results of one batch data print("infer results: ", cifar_classes[lab[0][0][-1]])
5、運行結果
由於筆者沒有GPU服務器,所以只迭代了50次,已經用了8個多小時,但是准確率只有15.6%,測試集方面准確率有17%,效果不理想,用於驗證的結果也是錯的!
Pass 300, Epoch 49, Cost 2.261115, Acc 0.156250 ......................................................................................... Test with Pass 49, Loss 2.2, Acc 0.17 Classify the cifar10 images... [array([[0.05997971, 0.13485196, 0.096842 , 0.09973737, 0.11053724, 0.08180068, 0.13847008, 0.08627985, 0.06851784, 0.12298328]], dtype=float32)] infer results: frog
結語
網絡比較深,且數據集比較大,訓練時間比較長,普通筆記本上面的GT840M聊以勝無吧。
本文代碼:02_cifar
參考:book/03.image_classification/