MNIST手寫體數字識別是神經網絡的一個經典的入門案例,堪稱深度學習界的”Hello Word任務”。
本博客基於python語言,在TensorFlow框架上對其進行了復現,並作了詳細的注釋,希望有參考作用。
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("D:\ClassStudy\ImageProcessing\MNIST_DATA", one_hot=True)
batch_size = 100 #batch大小為100,訓練樣本為55000,那么總共有5500個batch
learning_rate = 0.8
learning_rate_decay = 0.999
max_steps = 30000 #最大訓練步數
training_step = tf.Variable(0,trainable=False) #定義存儲訓練輪數的變量,一般將其設置為不可訓練的,完成一個batch即完成一輪訓練
def hidden_layer(input_tensor,weights1,biases1,weights2,biases2,layer_name):
'''
定義得到隱藏層和輸出層的前向傳播計算方式,采用relu()激活函數
'''
layer1=tf.nn.relu(tf.matmul(input_tensor,weights1)+biases1)
return tf.matmul(layer1,weights2)+biases2
x = tf.placeholder(tf.float32,[None,784],name='x-input')
y_ = tf.placeholder(tf.float32,[None,10],name='y-output')
#生成隱藏層權重參數,生成的是784*500的數組,總共392000個參數,500是經驗值,實際多少都可以
weights1 = tf.Variable(tf.truncated_normal([784,500],stddev=0.1))
biases1 = tf.Variable(tf.constant(0.1,shape=[500]))
#生成輸出層權重參數,生成的是500*10的數組,總共5000個參數,這里的500為了跟隱藏層的輸出矩陣列數500對應,10是要求輸出必須為10列,因為總共0-9就是10個分類
weights2 = tf.Variable(tf.truncated_normal([500,10],stddev=0.1))
biases2 = tf.Variable(tf.constant(0.1,shape=[10]))
#計算經過神經網絡前向傳播后得到的y值,這個y是一個10列的矩陣
y = hidden_layer(x,weights1,biases1,weights2,biases2,'y')
'''
為了在采用隨機梯度下降算法訓練神經網絡時提高最終模型在測試數據上的表現,TensorFlow提供了一種在變量上使用滑動平均的方法,通常稱之為滑動平均模型
'''
#通過train.ExponentialMovingAverage()函數初始化一個滑動平均類,同時需要向函數提供一個衰減率參數,這個衰減率控制模型更新的速度。
#滑動平均算法會對每一個變量的影子變量(shadow_variable)進行維護,這個影子變量的初始值就是相應變量的初始值。如果變量發生變化,影子變量也會按照一定的規則更新。
#衰減率決定了滑動平均模型的更新速度,一般設成接近於1,且越大模型越趨於穩。
averages_class = tf.train.ExponentialMovingAverage(0.99,training_step)
#通過滑動平均類的apply函數提供要進行滑動平均計算的變量
averages_op = averages_class.apply(tf.trainable_variables())
#average()函數是滑動平均類的一個函數,這個函數真正執行了影子變量的計算。在使用時,對其傳入需要進行計算的變量即可。
#這里再次計算y值,使用了滑動平均,但是要牢記滑動平均值只是一個影子變量。
average_y = hidden_layer(x,averages_class.average(weights1),
averages_class.average(biases1),
averages_class.average(weights2),
averages_class.average(biases2),'average_y')
#計算交叉熵損失,用到的這個函數適用於輸入的樣本只能被划分為某一類的情況,特別適合於我們這個任務。
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
#在得到交叉熵之后,我們可以計算權重的L2正則,並將正則損失和交叉熵損失糅合在一起計算總損失
regularizer = tf.contrib.layers.l2_regularizer(0.0001)
regularization = regularizer(weights1)+regularizer(weights2)
#總損失
loss = tf.reduce_mean(cross_entropy)+regularization
#總損失確定好了,還需要一個優化器。這里采用原理最簡單的隨機梯度下降優化器,學習率采用指數衰減的形式,優化器類的minimize()函數指明了最小化的目標。
learning_rate = tf.train.exponential_decay(learning_rate,training_step,mnist.train.num_examples/batch_size,learning_rate_decay)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=training_step)
#在訓練這個模型時,每過一遍數據既需要通過反向傳播來更新神經網絡中的參數,又需要更新每一個參數的滑動平均值,control_dependencies()用於完成這樣的一次性多次操作
with tf.control_dependencies([train_step,averages_op]):
train_op = tf.no_op(name='train')
#檢查使用了滑動平均值模型的神經網絡前向傳播結果是否正確
#equal()函數用於判斷兩個張量的每一位數組是否相等
#如果相等則返回true,否則返回false
crorent_predicition = tf.equal(tf.arg_max(average_y,1),tf.arg_max(y_,1))
#cast()函數原型為cast(x, DstT, name),在這里用於將一個布爾型的數據轉換為float32類型
#之后對得到的float32類型數據求平均值,這個平均值就是模型在這一組數據上的正確率
accuracy = tf.reduce_mean(tf.cast(crorent_predicition,tf.float32))
'''
以上都完成之后,就可以創建會話並開始訓練了
'''
with tf.Session() as sess:
#對參數進行初始化
tf.global_variables_initializer().run()
#准備驗證數據
validate_feed = {x:mnist.validation.images,y_:mnist.validation.labels}
#准備測試數據
test_feed = {x:mnist.test.images,y_:mnist.test.labels}
#循環訓練,最大訓練步數(輪數),訓練一個batch為一輪
for i in range(max_steps):
if i % 1000 == 0:
#計算滑動平均模型在驗證數據上的結果
#為了能得到百分數輸出,需要將validate_accuracy擴大100倍
validate_accuracy = sess.run(accuracy, feed_dict=validate_feed)
print('After %d training step(s), validation accuracy'
'using average model is %g%%' % (i,validate_accuracy*100))
#train.next_batch()函數通過設置函數的batch_size參數就可以從所有的訓練數據中讀取一小部分作為一個訓練的batch
xs,ys = mnist.train.next_batch(batch_size=100)
sess.run(train_op,feed_dict={x:xs,y_:ys})
#使用測試數據集最終驗證正確率,同樣為了得到得到百分數輸出,需要擴大100倍
test_accuracy = sess.run(accuracy,feed_dict=test_feed)
print('After %d training step(s), test accuracy using average'
'model is %g%%' % (max_steps,test_accuracy*100))
輸出結果:
After 0 training step(s), validation accuracyusing average model is 7.4% After 1000 training step(s), validation accuracyusing average model is 97.82% After 2000 training step(s), validation accuracyusing average model is 98.1% After 3000 training step(s), validation accuracyusing average model is 98.36% After 4000 training step(s), validation accuracyusing average model is 98.38% After 5000 training step(s), validation accuracyusing average model is 98.48% After 6000 training step(s), validation accuracyusing average model is 98.36% After 7000 training step(s), validation accuracyusing average model is 98.5% After 8000 training step(s), validation accuracyusing average model is 98.4% After 9000 training step(s), validation accuracyusing average model is 98.52% After 10000 training step(s), validation accuracyusing average model is 98.5% After 11000 training step(s), validation accuracyusing average model is 98.6% After 12000 training step(s), validation accuracyusing average model is 98.48% After 13000 training step(s), validation accuracyusing average model is 98.56% After 14000 training step(s), validation accuracyusing average model is 98.54% After 15000 training step(s), validation accuracyusing average model is 98.6% After 16000 training step(s), validation accuracyusing average model is 98.6% After 17000 training step(s), validation accuracyusing average model is 98.62% After 18000 training step(s), validation accuracyusing average model is 98.56% After 19000 training step(s), validation accuracyusing average model is 98.66% After 20000 training step(s), validation accuracyusing average model is 98.6% After 21000 training step(s), validation accuracyusing average model is 98.7% After 22000 training step(s), validation accuracyusing average model is 98.6% After 23000 training step(s), validation accuracyusing average model is 98.54% After 24000 training step(s), validation accuracyusing average model is 98.6% After 25000 training step(s), validation accuracyusing average model is 98.64% After 26000 training step(s), validation accuracyusing average model is 98.64% After 27000 training step(s), validation accuracyusing average model is 98.6% After 28000 training step(s), validation accuracyusing average model is 98.56% After 29000 training step(s), validation accuracyusing average model is 98.52% After 30000 training step(s), test accuracy using averagemodel is 98.4%
