【PaddlePaddle系列】手寫數字識別

本文轉載自查看原文 2018-08-22 14:17 1867 PaddlePaddle

最近百度為了推廣自家編寫對深度學習框架PaddlePaddle不斷推出各種比賽。百度聲稱PaddlePaddle是一個“易學、易用”的開源深度學習框架，然而網上的資料少之又少。雖然百度很用心地提供了許多文檔，而且還是中英雙語具備，但是最關鍵的是報錯了很難在網上找到相應的解決辦法。為了明年備戰百度的比賽，便開始學習以下PaddlePaddle。

1、安裝

PaddlePaddle同樣支持CUDA加速運算，但是如果沒有NVIDIA的顯卡，那就還是裝CPU版本。

CPU版本安裝：pip install paddlepaddle

GPU版本根據所安裝的CUDA版本以及cuDNN版本有所不同：

CUDA9 + cuDNN7.0：pip install paddlepaddle-gpu

CUDA8 + cuDNN7.0 : pip install paddlepaddle-gpu==0.14.0.post87

CUDA8 + cuDNN5.0 : pip install paddlepaddle-gpu==0.14.0.post85

2、手寫數字識別

其實，Paddle的GitHub提供了這個例程。但是，個人感覺這個例程部分直接調用PaddlePaddle內部類使得讀者閱讀起來十分困難。特別是數據輸入（Feed）中的reader，如果直接看程序，它直接一個函數就完成了圖像輸入，完全搞不懂它是如何操作。這里也就重點將這里，個人感覺這是和Tensorflow較大的區別。

2.1、網絡構建

程序中提供了三種網絡模型，代碼很明顯，這里應該不用太多說，直接貼出來了。需要注意的是，PaddlePaddle將圖像的通道數放在最前面，即為[C H W]，區別於[H W C]。

(1)、單層全連接層+softmax

#a full-connect-layer network using softmax as activation function
def softmax_regression():
    img = fluid.layers.data(name='img',shape=[1,28,28],dtype='float32')
    predict = fluid.layers.fc(input=img,size=10,act='softmax')
    return predict

(2)、多層全連接層+softmax

#3 full-connect-layers network using softmax as activation function
def multilayer_perceptron():
    img = fluid.layers.data(name='img',shape=[1,28,28],dtype='float32')
    hidden = fluid.layers.fc(input = img,size=128,act='softmax')
    hidden = fluid.layers.fc(input = hidden,size=64,act='softmax')
    prediction = fluid.layers.fc(input = hidden,size=10,act='softmax')
    return prediction

(3)、卷積神經網絡

#traditional converlutional neural network
def cnn():
    img = fluid.layers.data(name='img',shape=[1, 28, 28], dtype ='float32')
    # first conv pool
    conv_pool_1 = fluid.nets.simple_img_conv_pool(
        input = img,
        filter_size = 5,
        num_filters = 20,
        pool_size=2,
        pool_stride=2,
        act="relu")
    conv_pool_1 = fluid.layers.batch_norm(conv_pool_1)
    # second conv pool
    conv_pool_2 = fluid.nets.simple_img_conv_pool(
        input=conv_pool_1,
        filter_size=5,
        num_filters=50,
        pool_size=2,
        pool_stride=2,
        act="relu")
    # output layer with softmax activation function. size = 10 since there are only 10 possible digits.
    prediction = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
    return prediction

2.2、構建損失函數

PaddlePaddle的損失函數的構建基本上與tensorflow沒有太大的區別。但是需要指出的是：（1）在tensorflow中交叉熵的求解函數是使用[0 0 0 ... 1 ...]等長向量求解。但是在PaddlePaddle中，交叉熵是直接與一個整數求解；（2）標簽(lable)的輸入數據類型使用的是int64，盡管reader生成器返回的是int類型。筆者嘗試將其改為int32類型，但是會出錯。另外在其他實踐過程中使用int32也是有相應的錯誤。

def train_program():
    #if using dtype='int64', it reports errors!
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
    # Here we can build the prediction network in different ways. Please
    predict = cnn()
    #predict = softmax_regression()
    #predict = multilayer_perssion()
    # Calculate the cost from the prediction and label.
    cost = fluid.layers.cross_entropy(input=predict, label=label)
    avg_cost = fluid.layers.mean(cost)
    acc = fluid.layers.accuracy(input=predict, label=label)
    return [avg_cost, acc]

PaddlePaddle使用Trainer進行訓練，只需構建訓練函數train_program作為Trainer參數（這個下面個再詳細講解）。這里要說一下，函數返回一個向量[arg_cost, acc]，其中第一個元素作為損失函數，而后面幾個元素則是可選的，用於在迭代過程中print出來。所以，返回arg_cost是必要的，其他是可選的。特別說明：不要作死將一個常量放在里面，也就是里面的元素必須是會隨着訓練而變化，如果作死“acc=1”，則在訓練中會報錯。

2.3、訓練

PaddlePaddle使用fulid.Trainer來創建訓練器。這里則需要配備好訓練器的train_program(損失函數)、place(是否使用GPU)以及optimizer_program(優化器)。然后調用train函數來進行訓練。詳細可見下面程序：

def optimizer_program():
    return fluid.optimizer.Adam(learning_rate=0.001)
if __name__ == "__main__":
    print("run minst train\n")
    minst_prefix = '/home/dzqiu/DataSet/minst/'
    train_image_path   = minst_prefix + 'train-images-idx3-ubyte.gz'
    train_label_path   = minst_prefix + 'train-labels-idx1-ubyte.gz'
    test_image_path    = minst_prefix + 't10k-images-idx3-ubyte.gz'
    test_label_path    = minst_prefix + 't10k-labels-idx1-ubyte.gz'
    #reader_creator在將在下面講述
    train_reader = paddle.batch(paddle.reader.shuffle(#shuffle用於打亂buffer的循序
                    reader_creator(train_image_path,train_label_path,buffer_size=100),
                                        buf_size=500),
                                        batch_size=64)
    test_reader  = paddle.batch(
                    reader_creator(test_image_path,test_label_path,buffer_size=100),
                    batch_size=64)              #測試集就不用打亂了
    
    #if use GPU, use 'export FLAGS_fraction_of_gpu_memory_to_use=0' at first
    use_cuda = True
    place    = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    
    trainer  = fluid.Trainer(train_func=train_program,
                             place=place,
                             optimizer_func=optimizer_program)
    
    params_dirname = "recognize_digits_network.inference.model"
    lists = []
    #
    def event_handler(event):
        if isinstance(event,fluid.EndStepEvent):#每步觸發事件
            if event.step % 100 == 0:
                print("Pass %d, Epoch %d, Cost %f, Acc %f"\
                       %(event.step, event.epoch,
                       event.metrics[0],#train_program返回的第一個參數arg_cost
                       event.metrics[1]))#train_program返回的第二個參數acc
        if isinstance(event,fluid.EndEpochEvent):#每次迭代觸發事件
            trainer.save_params(params_dirname)
            #使用test的時候，返回值就是train_program的返回，所以賦值需要對應
            avg_cost, acc = trainer.test(reader=test_reader,
                                         feed_order=['img','label']) 
            print("Test with Epoch %d, avg_cost: %s, acc: %s"
                  %(event.epoch, avg_cost, acc))
            lists.append((event.epoch, avg_cost, acc))

    # Train the model now
    trainer.train(num_epochs=5,event_handler=event_handler,
                  reader=train_reader,feed_order=['img', 'label'])
    
    # find the best pass
    best = sorted(lists, key=lambda list: float(list[1]))[0]
    print 'Best pass is %s, testing Avgcost is %s' % (best[0], best[1])
    print 'The classification accuracy is %.2f%%' % (float(best[2]) * 100)

2.4、訓練數據的讀取 Reader

PaddlePaddle的訓練數據讀取僅用一個paddle.dataset.mnist.train()解決，封裝起來難以理解其操作，更不能看出如何讀取自己的訓練集。這里，我將這個段函數從源碼中挖出來簡化為reader_creator，實現對minst數據集的讀取，首先讓我們看看minst數據集的格式：

訓練集中，標簽集前8個字節是magic和數目，后面每個字節代表數字0-9的標簽；圖像集中前16字節是一些數據集信息，包括magic、圖像數目、行數和列數，后面每個字節代表每個像素點，也就是說我們連續取出28*28個字節安順序就可以組成28*28的圖片。弄清楚文件如何讀取，那么就可以編寫reader：

def reader_creator(image_filename,label_filename,buffer_size):
    def reader():
    #調用命令讀取文件，Linux下使用zcat
        if platform.system()=='Linux':
            zcat_cmd = 'zcat'
        elif paltform.system()=='Windows':
            zcat_cmd = 'gzcat'
        else:
            raise NotImplementedError("This program is suported on Windows or Linux,\
                                      but your platform is" + platform.system())
        
        #create a subprocess to read the images
        sub_img = subprocess.Popen([zcat_cmd, image_filename], stdout = subprocess.PIPE)
        sub_img.stdout.read(16) #skip some magic bytes 這里我們已經知道，所以我們不在需要前16字節
        #create a subprocess to read the labels
        sub_lab = subprocess.Popen([zcat_cmd, label_filename], stdout = subprocess.PIPE)
        sub_lab.stdout.read(8)  #skip some magic bytes 同理
        
    try:
            while True:         #前面使用try,故若再讀取過程中遇到結束則會退出
        #label is a pixel repersented by a unsigned byte,so just read a byte
                labels = numpy.fromfile(
                            sub_lab.stdout,'ubyte',count=buffer_size).astype("int")

                if labels.size != buffer_size:
                    break
        #read 28*28 byte as array,and then resize it
                images = numpy.fromfile(
                            sub_img.stdout,'ubyte',
                            count=buffer_size * 28 * 28)
                            .reshape(buffer_size, 28, 28).astype("float32")
        #mapping each pixel into (-1,1)
                images = images / 255.0 * 2.0 - 1.0;
                for i in xrange(buffer_size):
                    yield images[i,:],int(labels[i]) #將圖像與標簽拋出，循序與feed_order對應！
        finally:
            try:
        #terminate the reader subprocess
                sub_img.terminate()
            except:
                pass
            try:
        #terminate the reader subprocess
                sub_lable.terminate()
            except:
                pass
    return reader

2.5、運行結果

訓練集中有60000張圖片，buffer_size為100，batch_size為64，所以應該Pass了900多次。

Pass 0, Batch 0, Cost 4.250958, Acc 0.062500
Pass 100, Batch 0, Cost 0.249865, Acc 0.953125
Pass 200, Batch 0, Cost 0.281933, Acc 0.906250
Pass 300, Batch 0, Cost 0.147851, Acc 0.953125
Pass 400, Batch 0, Cost 0.144059, Acc 0.968750
Pass 500, Batch 0, Cost 0.082035, Acc 0.953125
Pass 600, Batch 0, Cost 0.105593, Acc 0.984375
Pass 700, Batch 0, Cost 0.148170, Acc 0.968750
Pass 800, Batch 0, Cost 0.182150, Acc 0.937500
Pass 900, Batch 0, Cost 0.066323, Acc 0.968750
Test with Epoch 0, avg_cost: 0.07329441363440427, acc: 0.9762620192307693
Pass 0, Batch 1, Cost 0.157396, Acc 0.953125
Pass 100, Batch 1, Cost 0.050120, Acc 0.968750
Pass 200, Batch 1, Cost 0.086324, Acc 0.984375
Pass 300, Batch 1, Cost 0.002137, Acc 1.000000
Pass 400, Batch 1, Cost 0.173876, Acc 0.984375
Pass 500, Batch 1, Cost 0.059772, Acc 0.968750
Pass 600, Batch 1, Cost 0.035788, Acc 0.984375
Pass 700, Batch 1, Cost 0.008351, Acc 1.000000
Pass 800, Batch 1, Cost 0.022678, Acc 0.984375
Pass 900, Batch 1, Cost 0.021835, Acc 1.000000
Test with Epoch 1, avg_cost: 0.06836433922317389, acc: 0.9774639423076923
Pass 0, Batch 2, Cost 0.214221, Acc 0.937500
Pass 100, Batch 2, Cost 0.212448, Acc 0.953125
Pass 200, Batch 2, Cost 0.007266, Acc 1.000000
Pass 300, Batch 2, Cost 0.015241, Acc 1.000000
Pass 400, Batch 2, Cost 0.061948, Acc 0.984375
Pass 500, Batch 2, Cost 0.043950, Acc 0.984375
Pass 600, Batch 2, Cost 0.018946, Acc 0.984375
Pass 700, Batch 2, Cost 0.015527, Acc 0.984375
Pass 800, Batch 2, Cost 0.035185, Acc 0.984375
Pass 900, Batch 2, Cost 0.004890, Acc 1.000000
Test with Epoch 2, avg_cost: 0.05774364945361809, acc: 0.9822716346153846
Pass 0, Batch 3, Cost 0.031849, Acc 0.984375
Pass 100, Batch 3, Cost 0.059525, Acc 0.953125
Pass 200, Batch 3, Cost 0.022106, Acc 0.984375
Pass 300, Batch 3, Cost 0.006763, Acc 1.000000
Pass 400, Batch 3, Cost 0.056089, Acc 0.984375
Pass 500, Batch 3, Cost 0.018876, Acc 1.000000
Pass 600, Batch 3, Cost 0.010325, Acc 1.000000
Pass 700, Batch 3, Cost 0.010989, Acc 1.000000
Pass 800, Batch 3, Cost 0.026476, Acc 0.984375
Pass 900, Batch 3, Cost 0.007792, Acc 1.000000
Test with Epoch 3, avg_cost: 0.05476908334449968, acc: 0.9830729166666666
Pass 0, Batch 4, Cost 0.061547, Acc 0.984375
Pass 100, Batch 4, Cost 0.002315, Acc 1.000000
Pass 200, Batch 4, Cost 0.009715, Acc 1.000000
Pass 300, Batch 4, Cost 0.024202, Acc 0.984375
Pass 400, Batch 4, Cost 0.150663, Acc 0.968750
Pass 500, Batch 4, Cost 0.082586, Acc 0.984375
Pass 600, Batch 4, Cost 0.012232, Acc 1.000000
Pass 700, Batch 4, Cost 0.055258, Acc 0.984375
Pass 800, Batch 4, Cost 0.016068, Acc 1.000000
Pass 900, Batch 4, Cost 0.004945, Acc 1.000000
Test with Epoch 4, avg_cost: 0.041706092633705505, acc: 0.9865785256410257
Best pass is 4, testing Avgcost is 0.041706092633705505
The classification accuracy is 98.66%

View Code

2.6 測試接口

PaddlePaddle提供接口函數，調用接口即可。特別的是，圖像需要轉化為[N C H W]的張量，如果是一張圖像，這里N當然是1，因為是灰度圖C也便是1。具體看下面代碼：

def load_image(file):
        im = Image.open(file).convert('L')
        im = im.resize((28, 28), Image.ANTIALIAS)
        im = numpy.array(im).reshape(1, 1, 28, 28).astype(np.float32) #[N C H W] 這里多了一個N
        im = im / 255.0 * 2.0 - 1.0
        return im
    cur_dir = os.path.dirname(os.path.realpath(__file__))
    img = load_image(cur_dir + '/infer_3.png')
    inferencer = fluid.Inferencer(
        # infer_func=softmax_regression, # uncomment for softmax regression
        # infer_func=multilayer_perceptron, # uncomment for MLP
        infer_func=cnn,  # uncomment for LeNet5
        param_path=params_dirname,
        place=place)
    results = inferencer.infer({'img': img})
    lab = numpy.argsort(results)  # probs and lab are the results of one batch data
    print "Label of infer_3.png is: %d" % lab[0][0][-1]

測試結果如下：

Label of infer_3.png is: 3

3、結語

PaddlePaddle與tensorflow還是有一定的區別，而且除了錯誤很難搜到解決方法，筆者會另外開一篇博客整理總結PaddlePaddle遇到的各種問題，這個對於例程的講解也將會繼續下去，堅持每周三更新，快開學了，還加把勁。

源碼地址：Github

參考：Paddle/book/02.recognize_digits

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 谷歌colab運行paddlepaddle之手寫數字識別手寫數字識別手寫數字識別【深度學習系列】手寫數字識別實戰 CNN實現手寫數字識別 TensorFlow(四)：手寫數字識別基於CNN的手寫數字識別程序 mnist手寫數字識別(SVM) TensorFlow——MNIST手寫數字識別基於模板匹配的手寫數字識別