TensorFlow之CNN:運用Batch Norm、Dropout和早停優化卷積神經網絡


學卷積神經網絡的理論的時候,我覺得自己看懂了,可是到了用代碼來搭建一個卷積神經網絡時,我發現自己有太多模糊的地方。這次還是基於MINIST數據集搭建一個卷積神經網絡,首先給出一個基本的模型,然后再用Batch Norm、Dropout和早停對模型進行優化;在此過程中說明我在調試代碼過程中遇到的一些問題和解決方法。

一、搭建基本的卷積神經網絡

第一步:准備數據

在《Hands on Machine Learning with Scikit-Learn and TensorFlow》這本書上,用的是下面這一段代碼來下載MINIST數據集。

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/")

用這種方式下載可能會報一個URLError的錯誤。大意是SSL證書驗證失敗,可以在前面加上下面那一段代碼來取消SSL證書驗證。

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)>
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

然后運行后會出現一大堆的WARNING,但是不用擔心,數據集還是能下載成功,而且還貼心地划分好了訓練集、驗證集和測試集,生成了batch,並reshape成了恰當的輸入格式(比如訓練集的維度已經是(55000, 784))。問題是下載太慢了,我失敗了很多次,成功全靠運氣。

我還是傾向於用tf.keras.datasets.mnist.load_data()來下載野生原始數據,然后自己動手划分數據、生成batch、整理成恰當的輸入格式。

import tensorflow as tf
import numpy as np
import time
from datetime import timedelta

# 記錄訓練花費的時間
def get_time_dif(start_time):
    end_time = time.time()
    time_dif = end_time - start_time
    #timedelta是用於對間隔進行規范化輸出,間隔10秒的輸出為:00:00:10    
    return timedelta(seconds=int(round(time_dif)))

# 准備訓練數據集、驗證集和測試集,並生成小批量樣本
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# 對數據進行歸一化,把訓練集reshape成(60000,784)的維度
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
# 划分訓練集和驗證集
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

第二步:配置參數

構建的網絡有兩個卷積層和一個全連接層,結構是:輸入層—卷積層1—卷積層2—最大池化層—全連接層—輸出層。卷積層又由卷積核與ReLU激活函數構成。

第一個卷積層有16個卷積核,尺寸為(3, 3),步幅為1,進行補零操作。第二個卷積層有32個卷積核,尺寸為(3,3),步幅為2,也進行補零。一般而言,越靠后的卷積層,輸出的特征圖要越多,而每個特征圖的尺寸要越小,這就需要增加卷積核、增大卷積核尺寸和增大步幅。這樣越往后就能提取到越高級的特征。


每個特征圖上的神經元的參數(權重和偏置)是共享的,而不同特征圖則有着不同的參數。每一個特征圖都能提取出一個圖片特征,這意味着特征圖越多,提取到的圖片特征也越多。

然后我們來看看相關的計算。假設卷積層的輸入神經元個數為 n,卷積核大小為 m,步長為 s,輸入神經元兩端各填補p個零,那么該卷積層的輸出神經元的個數為 (n-m+2p)/s + 1。由下面的參數可以知道,第1個卷積層輸入神經元的數量為 n=28*28=784,m=3,s=1,由於padding=“SAME”,那么由 (784-3+2p)+1=784可知,p=1,也就是左右各補1個零。

可是在第2個卷積層,我卻算出來補零的個數p不是整數,不知道是怎么進行后續操作的。

# 設定輸入的高度、寬度、通道數
height = 28
width = 28
channels = 1
n_inputs = height * width

# 設定卷積層特征圖(過濾器)的個數,卷積核的尺寸、步幅
conv1_fmaps = 16
conv1_ksize = 3
conv1_stride = 1
conv1_pad = "SAME"

conv2_fmaps = 32
conv2_ksize = 3
conv2_stride = 2
conv2_pad = "SAME"

# 最大池化層的特征圖數量(通道數)
pool3_fmaps = conv2_fmaps
# 設定全連接層的神經元數量。
n_fc1 = 32
n_outputs = 10

第三步:構建卷積網絡

下面的代碼正是按照上面所說的網絡結構去構建的,需要注意的地方有兩點:一是最大池化時不要補零,因為池化的作用就是減少內存占用和參數數量;二是在輸入到全連接層之前,要把所有特征圖拉平成一個向量。

with tf.name_scope("inputs"):
    X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")
    X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])
    y = tf.placeholder(tf.int32, shape=[None], name="y")

conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
                         strides=conv1_stride, padding=conv1_pad,
                         activation=tf.nn.relu, name="conv1")
conv2 = tf.layers.conv2d(conv1, filters=conv2_fmaps, kernel_size=conv2_ksize,
                         strides=conv2_stride, padding=conv2_pad,
                         activation=tf.nn.relu, name="conv2")

with tf.name_scope("pool3"):
    pool3 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
    # 把所有特征圖拉平成一個向量,最大池化后特在圖縮小為原來的1/16,所以由28*28變成了7*7
    pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 7 * 7])

with tf.name_scope("fc1"):
    fc1 = tf.layers.dense(pool3_flat, n_fc1, activation=tf.nn.relu, name="fc1")

with tf.name_scope("output"):
    logits = tf.layers.dense(fc1, n_outputs, name="output")
    Y_proba = tf.nn.softmax(logits, name="Y_proba")

with tf.name_scope("train"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)
    loss = tf.reduce_mean(xentropy)
    optimizer = tf.train.AdamOptimizer()
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

第四步:訓練和評估模型

訓練和評估階段最大的問題就是在卷積層可能存在內存溢出,尤其是評估和測試時。訓練時batch-size=100,問題不大,而驗證集的樣本數為5000,測試集的樣本數為10000,在計算時是非常消耗內存的。我在測試時,就出現了如下的錯誤:

ResourceExhaustedError: OOM when allocating tensor with shape[10000,16,29,29]...

OOM意思就是“ Out of Memorry”,這段錯誤是指在測試階段內存溢出了。我的GPU是GTX960M,顯卡內存是2G,實際訓練模型時可用的大概是1.65G,還是比較小。

遇到這種問題,有幾種解決辦法:一種是讓模型簡單點,比如減少卷積層的特征圖數量,增加步幅,減少卷積層的數量,但是這一般會讓模型的性能下降;第二種方法是把32位的浮點數改為16位的;第三種方法是在評估和測試時也進行小批量操作。

讓模型變得簡單會減低模型的性能,我試了,的確如此,因此我選擇了第三種方法,在評估和測試時,把數據按每批次1000個樣本輸入,然后求平均值。最終的驗證精度為98.74%。

with tf.name_scope("init_and_save"):
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    
    n_epochs = 10
    batch_size = 100
    
with tf.Session() as sess:
    init.run()
    start_time = time.time()
    
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train,y_train,batch_size):
            
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
            
        if epoch % 2 == 0 or epoch == 9:
            # 每次輸入1000個樣本進行評估,然后求平均值
            acc_val = []
            for i in range(len(X_valid)//1000):
                acc_val.append(accuracy.eval(feed_dict={X: X_valid[i*1000:(i+1)*1000], y: y_valid[i*1000:(i+1)*1000]}))                 
            acc_val = np.mean(acc_val)            
            print('Epoch:{0:>4}, Train accuracy:{1:>7.2%},Validate accuracy:{2:7.2%}'.format(epoch,acc_train, acc_val))
            
    time_dif = get_time_dif(start_time)
    print("\nTime usage:", time_dif)
    acc_test = []
    # 每次輸入1000個樣本進行測試,再求平均值
    for i in range(len(X_test)//1000):
        acc_test.append(accuracy.eval(feed_dict={X: X_test[i*1000:(i+1)*1000], y: y_test[i*1000:(i+1)*1000]}))
    acc_test = np.mean(acc_test)
    print("\nTest_accuracy:{0:>7.2%}".format(acc_test))
Epoch:   0, Train accuracy: 98.00%,Validate accuracy: 97.12%
Epoch:   2, Train accuracy: 97.00%,Validate accuracy: 98.34%
Epoch:   4, Train accuracy:100.00%,Validate accuracy: 98.62%
Epoch:   6, Train accuracy:100.00%,Validate accuracy: 98.84%
Epoch:   8, Train accuracy: 99.00%,Validate accuracy: 98.68%
Epoch:   9, Train accuracy:100.00%,Validate accuracy: 98.86%

Time usage: 0:01:02

Test_accuracy: 98.68%

二、用Batch Norm、Dropout和早停優化卷積神經網絡

參考的這本書里用Dropout和早停來優化卷積神經網絡的基本模型,沒有用Batch Norm來優化。我覺得作者實現早停的代碼太復雜了,推薦用我的這個代碼來實現,清晰明了。

關於在卷積神經網絡中運用Batch Norm的代碼我暫時沒找到,只能憑自己的理解來實現。Batch Norm在哪些層用呢?我覺得在卷積層和全連接層(包括輸出層)用,在池化層就不用了,因為內部協變量偏移問題應該主要源自於層與層之間的非線性變換,而池化層的輸出值並沒有做非線性激活,因此在之后的全連接層做Batch Norm就行。

Dropout運用在池化層和全連接層,丟棄率分別為0.25和0.5,注意是按照Batch Norm—SELU函數激活—Dropout的順序來進行。

同時將第2個卷積層的卷積步幅設置為1,以獲得尺寸更大的特征圖和更多參數。

設置迭代輪次為20,batch size = 100,做Batch Norm 時因為要求每個小批量的均值和方差,因此batch size 可以稍微設置得大一些。如果2000步以后驗證精度仍然沒有提升,那就中止訓練。

結果,模型在第18輪、第9921步中止了訓練,最好的驗證精度為99.22%,測試精度為98.94%。

import tensorflow as tf
import numpy as np
import time
from datetime import timedelta
from functools import partial

# 記錄訓練花費的時間
def get_time_dif(start_time):
    end_time = time.time()
    time_dif = end_time - start_time
    #timedelta是用於對間隔進行規范化輸出,間隔10秒的輸出為:00:00:10    
    return timedelta(seconds=int(round(time_dif)))

# 准備訓練數據集、驗證集和測試集,並生成小批量樣本
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

height = 28
width = 28
channels = 1
n_inputs = height * width

# 第一個卷積層有16個卷積核
# 卷積核的大小為(3,3)
# 步幅為1
# 通過補零讓輸入與輸出的維度相同
conv1_fmaps = 16
conv1_ksize = 3
conv1_stride = 1
conv1_pad = "SAME"

conv2_fmaps = 32
conv2_ksize = 3
conv2_stride = 1
conv2_pad = "SAME"
# 在池化層丟棄25%的神經元
conv2_dropout_rate = 0.25

pool3_fmaps = conv2_fmaps

n_fc1 = 32
# 在全連接層丟棄50%的神經元
fc1_dropout_rate = 0.5

n_outputs = 10

with tf.name_scope("inputs"):
    X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")
    X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])
    y = tf.placeholder(tf.int32, shape=[None], name="y")
    
training = tf.placeholder_with_default(False, shape=[], name='training')
# 構建一個batch norm層,便於復用。用移動平均求全局的樣本均值和方差,動量參數取0.9
my_batch_norm_layer = partial(tf.layers.batch_normalization,
                              training=training, momentum=0.9)

with tf.name_scope("conv"):
    # batch norm之后在激活,所以這里不設定激活函數
    conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
                         strides=conv1_stride, padding=conv1_pad,
                         activation=None, name="conv1")
    # 進行batch norm之后,再激活
    batch_norm1 = tf.nn.selu(my_batch_norm_layer(conv1))
    conv2 = tf.layers.conv2d(batch_norm1, filters=conv2_fmaps, kernel_size=conv2_ksize,
                         strides=conv2_stride, padding=conv2_pad,
                         activation=None, name="conv2")
    batch_norm2 = tf.nn.selu(my_batch_norm_layer(conv2))
   
with tf.name_scope("pool3"):
    pool3 = tf.nn.max_pool(batch_norm2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
    # 把特征圖拉平成一個向量
    pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 14 * 14])
    # 丟棄25%的神經元
    pool3_flat_drop = tf.layers.dropout(pool3_flat, conv2_dropout_rate, training=training)
    
with tf.name_scope("fc1"):
    fc1 = tf.layers.dense(pool3_flat_drop, n_fc1, activation=None, name="fc1")
    # 在全連接層進行batch norm,然后激活
    batch_norm4 = tf.nn.selu(my_batch_norm_layer(fc1))
    # 丟棄50%的神經元
    fc1_drop = tf.layers.dropout(batch_norm4, fc1_dropout_rate, training=training)
    
with tf.name_scope("output"):
    logits = tf.layers.dense(fc1_drop, n_outputs, name="output")
    logits_batch_norm = my_batch_norm_layer(logits)
    Y_proba = tf.nn.softmax(logits_batch_norm, name="Y_proba")

with tf.name_scope("loss_and_train"):
    learning_rate = 0.01
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits_batch_norm, labels=y)
    loss = tf.reduce_mean(xentropy)

    optimizer = tf.train.AdamOptimizer(learning_rate)
    # 這是需要額外更新batch norm的參數
    extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    # 模型參數的優化依賴與batch norm參數的更新
    with tf.control_dependencies(extra_update_ops):
        training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.name_scope("init_and_save"):
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    
n_epochs = 20
batch_size = 100

with tf.Session() as sess:
    init.run()
    start_time = time.time()
    
    # 記錄總迭代步數,一個batch算一步
    # 記錄最好的驗證精度
    # 記錄上一次驗證結果提升時是第幾步。
    # 如果迭代2000步后結果還沒有提升就中止訓練。
    total_batch = 0
    best_acc_val = 0.0
    last_improved = 0
    require_improvement = 2000
    
    flag = False
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            
            sess.run(training_op, feed_dict={training:True, X: X_batch, y: y_batch})
            
            # 每次迭代10步就驗證一次
            if total_batch % 10 == 0:
                acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
                            # 每次輸入1000個樣本進行評估,然后求平均值
                acc_val = []
                for i in range(len(X_valid)//1000):
                    acc_val.append(accuracy.eval(feed_dict={X: X_valid[i*1000:(i+1)*1000], y: y_valid[i*1000:(i+1)*1000]}))                 
                acc_val = np.mean(acc_val)
                
                # 如果驗證精度提升了,就替換為最好的結果,並保存模型
                if acc_val > best_acc_val:
                    best_acc_val = acc_val
                    last_improved = total_batch
                    save_path = saver.save(sess, "./my_model_CNN_stop.ckpt")
                    improved_str = 'improved!'
                else:
                    improved_str = ''
                
                # 記錄訓練時間,並格式化輸出驗證結果,如果提升了,會在后面提示:improved!
                time_dif = get_time_dif(start_time)
                msg = 'Epoch:{0:>4}, Iter: {1:>6}, Acc_Train: {2:>7.2%}, Acc_Val: {3:>7.2%}, Time: {4} {5}'
                print(msg.format(epoch, total_batch, acc_batch, acc_val, time_dif, improved_str))
            
            # 記錄總迭代步數    
            total_batch += 1
            
            # 如果2000步以后還沒提升,就中止訓練。
            if total_batch - last_improved > require_improvement:
                print("Early stopping in  ",total_batch," step! And the best validation accuracy is ",best_acc_val, '.')
                # 跳出這個輪次的循環
                flag = True
                break
        # 跳出所有訓練輪次的循環
        if flag:
            break        
    
with tf.Session() as sess:
    saver.restore(sess, "./my_model_CNN_stop.ckpt") 
    # 每次輸入1000個樣本進行測試,再求平均值
    acc_test = []
    for i in range(len(X_test)//1000):
        acc_test.append(accuracy.eval(feed_dict={X: X_test[i*1000:(i+1)*1000], y: y_test[i*1000:(i+1)*1000]}))
    acc_test = np.mean(acc_test)
    print("\nTest_accuracy:{0:>7.2%}".format(acc_test))
Early stopping in   9921  step! And the best validation accuracy is  0.9922 .
INFO:tensorflow:Restoring parameters from ./my_model_CNN_stop.ckpt

Test_accuracy: 98.94%

 

 

參考資料:

《Hands on Machine Learning with Scikit-Learn and TensorFlow》


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM