學卷積神經網絡的理論的時候,我覺得自己看懂了,可是到了用代碼來搭建一個卷積神經網絡時,我發現自己有太多模糊的地方。這次還是基於MINIST數據集搭建一個卷積神經網絡,首先給出一個基本的模型,然后再用Batch Norm、Dropout和早停對模型進行優化;在此過程中說明我在調試代碼過程中遇到的一些問題和解決方法。
一、搭建基本的卷積神經網絡
第一步:准備數據
在《Hands on Machine Learning with Scikit-Learn and TensorFlow》這本書上,用的是下面這一段代碼來下載MINIST數據集。
from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/tmp/data/")
用這種方式下載可能會報一個URLError的錯誤。大意是SSL證書驗證失敗,可以在前面加上下面那一段代碼來取消SSL證書驗證。
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)>
import ssl ssl._create_default_https_context = ssl._create_unverified_context
然后運行后會出現一大堆的WARNING,但是不用擔心,數據集還是能下載成功,而且還貼心地划分好了訓練集、驗證集和測試集,生成了batch,並reshape成了恰當的輸入格式(比如訓練集的維度已經是(55000, 784))。問題是下載太慢了,我失敗了很多次,成功全靠運氣。
我還是傾向於用tf.keras.datasets.mnist.load_data()來下載野生原始數據,然后自己動手划分數據、生成batch、整理成恰當的輸入格式。
import tensorflow as tf import numpy as np import time from datetime import timedelta # 記錄訓練花費的時間 def get_time_dif(start_time): end_time = time.time() time_dif = end_time - start_time #timedelta是用於對間隔進行規范化輸出,間隔10秒的輸出為:00:00:10 return timedelta(seconds=int(round(time_dif))) # 准備訓練數據集、驗證集和測試集,並生成小批量樣本 (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data() # 對數據進行歸一化,把訓練集reshape成(60000,784)的維度 X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0 X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0 y_train = y_train.astype(np.int32) y_test = y_test.astype(np.int32) # 划分訓練集和驗證集 X_valid, X_train = X_train[:5000], X_train[5000:] y_valid, y_train = y_train[:5000], y_train[5000:] def shuffle_batch(X, y, batch_size): rnd_idx = np.random.permutation(len(X)) n_batches = len(X) // batch_size for batch_idx in np.array_split(rnd_idx, n_batches): X_batch, y_batch = X[batch_idx], y[batch_idx] yield X_batch, y_batch
第二步:配置參數
構建的網絡有兩個卷積層和一個全連接層,結構是:輸入層—卷積層1—卷積層2—最大池化層—全連接層—輸出層。卷積層又由卷積核與ReLU激活函數構成。
第一個卷積層有16個卷積核,尺寸為(3, 3),步幅為1,進行補零操作。第二個卷積層有32個卷積核,尺寸為(3,3),步幅為2,也進行補零。一般而言,越靠后的卷積層,輸出的特征圖要越多,而每個特征圖的尺寸要越小,這就需要增加卷積核、增大卷積核尺寸和增大步幅。這樣越往后就能提取到越高級的特征。
每個特征圖上的神經元的參數(權重和偏置)是共享的,而不同特征圖則有着不同的參數。每一個特征圖都能提取出一個圖片特征,這意味着特征圖越多,提取到的圖片特征也越多。
然后我們來看看相關的計算。假設卷積層的輸入神經元個數為 n,卷積核大小為 m,步長為 s,輸入神經元兩端各填補p個零,那么該卷積層的輸出神經元的個數為 (n-m+2p)/s + 1。由下面的參數可以知道,第1個卷積層輸入神經元的數量為 n=28*28=784,m=3,s=1,由於padding=“SAME”,那么由 (784-3+2p)+1=784可知,p=1,也就是左右各補1個零。
可是在第2個卷積層,我卻算出來補零的個數p不是整數,不知道是怎么進行后續操作的。
# 設定輸入的高度、寬度、通道數 height = 28 width = 28 channels = 1 n_inputs = height * width # 設定卷積層特征圖(過濾器)的個數,卷積核的尺寸、步幅 conv1_fmaps = 16 conv1_ksize = 3 conv1_stride = 1 conv1_pad = "SAME" conv2_fmaps = 32 conv2_ksize = 3 conv2_stride = 2 conv2_pad = "SAME" # 最大池化層的特征圖數量(通道數) pool3_fmaps = conv2_fmaps # 設定全連接層的神經元數量。 n_fc1 = 32 n_outputs = 10
第三步:構建卷積網絡
下面的代碼正是按照上面所說的網絡結構去構建的,需要注意的地方有兩點:一是最大池化時不要補零,因為池化的作用就是減少內存占用和參數數量;二是在輸入到全連接層之前,要把所有特征圖拉平成一個向量。
with tf.name_scope("inputs"): X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X") X_reshaped = tf.reshape(X, shape=[-1, height, width, channels]) y = tf.placeholder(tf.int32, shape=[None], name="y") conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize, strides=conv1_stride, padding=conv1_pad, activation=tf.nn.relu, name="conv1") conv2 = tf.layers.conv2d(conv1, filters=conv2_fmaps, kernel_size=conv2_ksize, strides=conv2_stride, padding=conv2_pad, activation=tf.nn.relu, name="conv2") with tf.name_scope("pool3"): pool3 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID") # 把所有特征圖拉平成一個向量,最大池化后特在圖縮小為原來的1/16,所以由28*28變成了7*7 pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 7 * 7]) with tf.name_scope("fc1"): fc1 = tf.layers.dense(pool3_flat, n_fc1, activation=tf.nn.relu, name="fc1") with tf.name_scope("output"): logits = tf.layers.dense(fc1, n_outputs, name="output") Y_proba = tf.nn.softmax(logits, name="Y_proba") with tf.name_scope("train"): xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y) loss = tf.reduce_mean(xentropy) optimizer = tf.train.AdamOptimizer() training_op = optimizer.minimize(loss) with tf.name_scope("eval"): correct = tf.nn.in_top_k(logits, y, 1) accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
第四步:訓練和評估模型
訓練和評估階段最大的問題就是在卷積層可能存在內存溢出,尤其是評估和測試時。訓練時batch-size=100,問題不大,而驗證集的樣本數為5000,測試集的樣本數為10000,在計算時是非常消耗內存的。我在測試時,就出現了如下的錯誤:
ResourceExhaustedError: OOM when allocating tensor with shape[10000,16,29,29]...
OOM意思就是“ Out of Memorry”,這段錯誤是指在測試階段內存溢出了。我的GPU是GTX960M,顯卡內存是2G,實際訓練模型時可用的大概是1.65G,還是比較小。
遇到這種問題,有幾種解決辦法:一種是讓模型簡單點,比如減少卷積層的特征圖數量,增加步幅,減少卷積層的數量,但是這一般會讓模型的性能下降;第二種方法是把32位的浮點數改為16位的;第三種方法是在評估和測試時也進行小批量操作。
讓模型變得簡單會減低模型的性能,我試了,的確如此,因此我選擇了第三種方法,在評估和測試時,把數據按每批次1000個樣本輸入,然后求平均值。最終的驗證精度為98.74%。
with tf.name_scope("init_and_save"): init = tf.global_variables_initializer() saver = tf.train.Saver() n_epochs = 10 batch_size = 100 with tf.Session() as sess: init.run() start_time = time.time() for epoch in range(n_epochs): for X_batch, y_batch in shuffle_batch(X_train,y_train,batch_size): sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch}) if epoch % 2 == 0 or epoch == 9: # 每次輸入1000個樣本進行評估,然后求平均值 acc_val = [] for i in range(len(X_valid)//1000): acc_val.append(accuracy.eval(feed_dict={X: X_valid[i*1000:(i+1)*1000], y: y_valid[i*1000:(i+1)*1000]})) acc_val = np.mean(acc_val) print('Epoch:{0:>4}, Train accuracy:{1:>7.2%},Validate accuracy:{2:7.2%}'.format(epoch,acc_train, acc_val)) time_dif = get_time_dif(start_time) print("\nTime usage:", time_dif) acc_test = [] # 每次輸入1000個樣本進行測試,再求平均值 for i in range(len(X_test)//1000): acc_test.append(accuracy.eval(feed_dict={X: X_test[i*1000:(i+1)*1000], y: y_test[i*1000:(i+1)*1000]})) acc_test = np.mean(acc_test) print("\nTest_accuracy:{0:>7.2%}".format(acc_test))
Epoch: 0, Train accuracy: 98.00%,Validate accuracy: 97.12% Epoch: 2, Train accuracy: 97.00%,Validate accuracy: 98.34% Epoch: 4, Train accuracy:100.00%,Validate accuracy: 98.62% Epoch: 6, Train accuracy:100.00%,Validate accuracy: 98.84% Epoch: 8, Train accuracy: 99.00%,Validate accuracy: 98.68% Epoch: 9, Train accuracy:100.00%,Validate accuracy: 98.86% Time usage: 0:01:02 Test_accuracy: 98.68%
二、用Batch Norm、Dropout和早停優化卷積神經網絡
參考的這本書里用Dropout和早停來優化卷積神經網絡的基本模型,沒有用Batch Norm來優化。我覺得作者實現早停的代碼太復雜了,推薦用我的這個代碼來實現,清晰明了。
關於在卷積神經網絡中運用Batch Norm的代碼我暫時沒找到,只能憑自己的理解來實現。Batch Norm在哪些層用呢?我覺得在卷積層和全連接層(包括輸出層)用,在池化層就不用了,因為內部協變量偏移問題應該主要源自於層與層之間的非線性變換,而池化層的輸出值並沒有做非線性激活,因此在之后的全連接層做Batch Norm就行。
Dropout運用在池化層和全連接層,丟棄率分別為0.25和0.5,注意是按照Batch Norm—SELU函數激活—Dropout的順序來進行。
同時將第2個卷積層的卷積步幅設置為1,以獲得尺寸更大的特征圖和更多參數。
設置迭代輪次為20,batch size = 100,做Batch Norm 時因為要求每個小批量的均值和方差,因此batch size 可以稍微設置得大一些。如果2000步以后驗證精度仍然沒有提升,那就中止訓練。
結果,模型在第18輪、第9921步中止了訓練,最好的驗證精度為99.22%,測試精度為98.94%。
import tensorflow as tf import numpy as np import time from datetime import timedelta from functools import partial # 記錄訓練花費的時間 def get_time_dif(start_time): end_time = time.time() time_dif = end_time - start_time #timedelta是用於對間隔進行規范化輸出,間隔10秒的輸出為:00:00:10 return timedelta(seconds=int(round(time_dif))) # 准備訓練數據集、驗證集和測試集,並生成小批量樣本 (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data() X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0 X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0 y_train = y_train.astype(np.int32) y_test = y_test.astype(np.int32) X_valid, X_train = X_train[:5000], X_train[5000:] y_valid, y_train = y_train[:5000], y_train[5000:] def shuffle_batch(X, y, batch_size): rnd_idx = np.random.permutation(len(X)) n_batches = len(X) // batch_size for batch_idx in np.array_split(rnd_idx, n_batches): X_batch, y_batch = X[batch_idx], y[batch_idx] yield X_batch, y_batch height = 28 width = 28 channels = 1 n_inputs = height * width # 第一個卷積層有16個卷積核 # 卷積核的大小為(3,3) # 步幅為1 # 通過補零讓輸入與輸出的維度相同 conv1_fmaps = 16 conv1_ksize = 3 conv1_stride = 1 conv1_pad = "SAME" conv2_fmaps = 32 conv2_ksize = 3 conv2_stride = 1 conv2_pad = "SAME" # 在池化層丟棄25%的神經元 conv2_dropout_rate = 0.25 pool3_fmaps = conv2_fmaps n_fc1 = 32 # 在全連接層丟棄50%的神經元 fc1_dropout_rate = 0.5 n_outputs = 10 with tf.name_scope("inputs"): X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X") X_reshaped = tf.reshape(X, shape=[-1, height, width, channels]) y = tf.placeholder(tf.int32, shape=[None], name="y") training = tf.placeholder_with_default(False, shape=[], name='training') # 構建一個batch norm層,便於復用。用移動平均求全局的樣本均值和方差,動量參數取0.9 my_batch_norm_layer = partial(tf.layers.batch_normalization, training=training, momentum=0.9) with tf.name_scope("conv"): # batch norm之后在激活,所以這里不設定激活函數 conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize, strides=conv1_stride, padding=conv1_pad, activation=None, name="conv1") # 進行batch norm之后,再激活 batch_norm1 = tf.nn.selu(my_batch_norm_layer(conv1)) conv2 = tf.layers.conv2d(batch_norm1, filters=conv2_fmaps, kernel_size=conv2_ksize, strides=conv2_stride, padding=conv2_pad, activation=None, name="conv2") batch_norm2 = tf.nn.selu(my_batch_norm_layer(conv2)) with tf.name_scope("pool3"): pool3 = tf.nn.max_pool(batch_norm2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID") # 把特征圖拉平成一個向量 pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 14 * 14]) # 丟棄25%的神經元 pool3_flat_drop = tf.layers.dropout(pool3_flat, conv2_dropout_rate, training=training) with tf.name_scope("fc1"): fc1 = tf.layers.dense(pool3_flat_drop, n_fc1, activation=None, name="fc1") # 在全連接層進行batch norm,然后激活 batch_norm4 = tf.nn.selu(my_batch_norm_layer(fc1)) # 丟棄50%的神經元 fc1_drop = tf.layers.dropout(batch_norm4, fc1_dropout_rate, training=training) with tf.name_scope("output"): logits = tf.layers.dense(fc1_drop, n_outputs, name="output") logits_batch_norm = my_batch_norm_layer(logits) Y_proba = tf.nn.softmax(logits_batch_norm, name="Y_proba") with tf.name_scope("loss_and_train"): learning_rate = 0.01 xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits_batch_norm, labels=y) loss = tf.reduce_mean(xentropy) optimizer = tf.train.AdamOptimizer(learning_rate) # 這是需要額外更新batch norm的參數 extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) # 模型參數的優化依賴與batch norm參數的更新 with tf.control_dependencies(extra_update_ops): training_op = optimizer.minimize(loss) with tf.name_scope("eval"): correct = tf.nn.in_top_k(logits, y, 1) accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) with tf.name_scope("init_and_save"): init = tf.global_variables_initializer() saver = tf.train.Saver() n_epochs = 20 batch_size = 100 with tf.Session() as sess: init.run() start_time = time.time() # 記錄總迭代步數,一個batch算一步 # 記錄最好的驗證精度 # 記錄上一次驗證結果提升時是第幾步。 # 如果迭代2000步后結果還沒有提升就中止訓練。 total_batch = 0 best_acc_val = 0.0 last_improved = 0 require_improvement = 2000 flag = False for epoch in range(n_epochs): for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size): sess.run(training_op, feed_dict={training:True, X: X_batch, y: y_batch}) # 每次迭代10步就驗證一次 if total_batch % 10 == 0: acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch}) # 每次輸入1000個樣本進行評估,然后求平均值 acc_val = [] for i in range(len(X_valid)//1000): acc_val.append(accuracy.eval(feed_dict={X: X_valid[i*1000:(i+1)*1000], y: y_valid[i*1000:(i+1)*1000]})) acc_val = np.mean(acc_val) # 如果驗證精度提升了,就替換為最好的結果,並保存模型 if acc_val > best_acc_val: best_acc_val = acc_val last_improved = total_batch save_path = saver.save(sess, "./my_model_CNN_stop.ckpt") improved_str = 'improved!' else: improved_str = '' # 記錄訓練時間,並格式化輸出驗證結果,如果提升了,會在后面提示:improved! time_dif = get_time_dif(start_time) msg = 'Epoch:{0:>4}, Iter: {1:>6}, Acc_Train: {2:>7.2%}, Acc_Val: {3:>7.2%}, Time: {4} {5}' print(msg.format(epoch, total_batch, acc_batch, acc_val, time_dif, improved_str)) # 記錄總迭代步數 total_batch += 1 # 如果2000步以后還沒提升,就中止訓練。 if total_batch - last_improved > require_improvement: print("Early stopping in ",total_batch," step! And the best validation accuracy is ",best_acc_val, '.') # 跳出這個輪次的循環 flag = True break # 跳出所有訓練輪次的循環 if flag: break with tf.Session() as sess: saver.restore(sess, "./my_model_CNN_stop.ckpt") # 每次輸入1000個樣本進行測試,再求平均值 acc_test = [] for i in range(len(X_test)//1000): acc_test.append(accuracy.eval(feed_dict={X: X_test[i*1000:(i+1)*1000], y: y_test[i*1000:(i+1)*1000]})) acc_test = np.mean(acc_test) print("\nTest_accuracy:{0:>7.2%}".format(acc_test))
Early stopping in 9921 step! And the best validation accuracy is 0.9922 . INFO:tensorflow:Restoring parameters from ./my_model_CNN_stop.ckpt Test_accuracy: 98.94%
參考資料:
《Hands on Machine Learning with Scikit-Learn and TensorFlow》