本實驗通過建立一個含有兩個隱含層的BP神經網絡,擬合具有二次函數非線性關系的方程,並通過可視化展現學習到的擬合曲線,同時隨機給定輸入值,輸出預測值,最后給出一些關鍵的提示。
源代碼如下:
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np import matplotlib.pyplot as plt plotdata = { "batchsize":[], "loss":[] } def moving_average(a, w=11): if len(a) < w: return a[:] return [val if idx < w else sum(a[(idx-w):idx])/w for idx, val in enumerate(a)] #生成模擬數據,二次函數關系 train_X = np.linspace(-1, 1, 100)[:, np.newaxis] train_Y = train_X*train_X + 5 * train_X + np.random.randn(*train_X.shape) * 0.3 #子圖1顯示模擬數據點 plt.figure(12) plt.subplot(221) plt.plot(train_X, train_Y, 'ro', label='Original data') plt.legend() # 創建模型 # 占位符 X = tf.placeholder("float",[None,1]) Y = tf.placeholder("float",[None,1]) # 模型參數 W1 = tf.Variable(tf.random_normal([1,10]), name="weight1") b1 = tf.Variable(tf.zeros([1,10]), name="bias1") W2 = tf.Variable(tf.random_normal([10,6]), name="weight2") b2 = tf.Variable(tf.zeros([1,6]), name="bias2") W3 = tf.Variable(tf.random_normal([6,1]), name="weight3") b3 = tf.Variable(tf.zeros([1]), name="bias3") # 前向結構 z1 = tf.matmul(X, W1) + b1 z2 = tf.nn.relu(z1) z3 = tf.matmul(z2, W2) + b2 z4 = tf.nn.relu(z3) z5 = tf.matmul(z4, W3) + b3 #反向優化 cost =tf.reduce_mean( tf.square(Y - z5)) learning_rate = 0.01 optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) #Gradient descent # 初始化變量 init = tf.global_variables_initializer() # 訓練參數 training_epochs = 5000 display_step = 2 # 啟動session with tf.Session() as sess: sess.run(init) for epoch in range(training_epochs+1): sess.run(optimizer, feed_dict={X: train_X, Y: train_Y}) #顯示訓練中的詳細信息 if epoch % display_step == 0: loss = sess.run(cost, feed_dict={X: train_X, Y:train_Y}) print ("Epoch:", epoch, "cost=", loss) if not (loss == "NA" ): plotdata["batchsize"].append(epoch) plotdata["loss"].append(loss) print (" Finish") #圖形顯示 plt.subplot(222) plt.plot(train_X, train_Y, 'ro', label='Original data') plt.plot(train_X, sess.run(z5, feed_dict={X: train_X}), label='Fitted line') plt.legend() plotdata["avgloss"] = moving_average(plotdata["loss"]) plt.subplot(212) plt.plot(plotdata["batchsize"], plotdata["avgloss"], 'b--') plt.xlabel('Minibatch number') plt.ylabel('Loss') plt.title('Minibatch run vs Training loss') plt.show() #預測結果 a=[[0.2],[0.3]] print ("x=[[0.2],[0.3]],z5=", sess.run(z5, feed_dict={X: a}))
運行結果如下:
結果實在是太棒了,把這個關系擬合的非常好。在上述的例子中,需要進一步說明如下內容:
- 輸入節點可以通過字典類型定義,而后通過字典的方法訪問
input = { 'X': tf.placeholder("float",[None,1]), 'Y': tf.placeholder("float",[None,1]) }
sess.run(optimizer, feed_dict={input['X']: train_X, input['Y']: train_Y})
直接定義輸入節點的方法是不推薦使用的。
- 變量也可以通過字典類型定義,例如上述代碼可以改為:
parameter = { 'W1': tf.Variable(tf.random_normal([1,10]), name="weight1"), 'b1': tf.Variable(tf.zeros([1,10]), name="bias1"), 'W2': tf.Variable(tf.random_normal([10,6]), name="weight2"), 'b2': tf.Variable(tf.zeros([1,6]), name="bias2"), 'W3': tf.Variable(tf.random_normal([6,1]), name="weight3"), 'b3': tf.Variable(tf.zeros([1]), name="bias3") }
z1 = tf.matmul(X, parameter['W1']) +parameter['b1']
在上述代碼中練習保存/載入模型,代碼如下:
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np import matplotlib.pyplot as plt plotdata = { "batchsize":[], "loss":[] } def moving_average(a, w=11): if len(a) < w: return a[:] return [val if idx < w else sum(a[(idx-w):idx])/w for idx, val in enumerate(a)] #生成模擬數據,二次函數關系 train_X = np.linspace(-1, 1, 100)[:, np.newaxis] train_Y = train_X*train_X + 5 * train_X + np.random.randn(*train_X.shape) * 0.3 #子圖1顯示模擬數據點 plt.figure(12) plt.subplot(221) plt.plot(train_X, train_Y, 'ro', label='Original data') plt.legend() # 創建模型 # 字典型占位符 input = {'X':tf.placeholder("float",[None,1]), 'Y':tf.placeholder("float",[None,1])} # X = tf.placeholder("float",[None,1]) # Y = tf.placeholder("float",[None,1]) # 模型參數 parameter = {'W1':tf.Variable(tf.random_normal([1,10]), name="weight1"), 'b1':tf.Variable(tf.zeros([1,10]), name="bias1"), 'W2':tf.Variable(tf.random_normal([10,6]), name="weight2"),'b2':tf.Variable(tf.zeros([1,6]), name="bias2"), 'W3':tf.Variable(tf.random_normal([6,1]), name="weight3"), 'b3':tf.Variable(tf.zeros([1]), name="bias3")} # W1 = tf.Variable(tf.random_normal([1,10]), name="weight1") # b1 = tf.Variable(tf.zeros([1,10]), name="bias1") # W2 = tf.Variable(tf.random_normal([10,6]), name="weight2") # b2 = tf.Variable(tf.zeros([1,6]), name="bias2") # W3 = tf.Variable(tf.random_normal([6,1]), name="weight3") # b3 = tf.Variable(tf.zeros([1]), name="bias3") # 前向結構 z1 = tf.matmul(input['X'], parameter['W1']) + parameter['b1'] z2 = tf.nn.relu(z1) z3 = tf.matmul(z2, parameter['W2']) + parameter['b2'] z4 = tf.nn.relu(z3) z5 = tf.matmul(z4, parameter['W3']) + parameter['b3'] #反向優化 cost =tf.reduce_mean( tf.square(input['Y'] - z5)) learning_rate = 0.01 optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) #Gradient descent # 初始化變量 init = tf.global_variables_initializer() # 訓練參數 training_epochs = 5000 display_step = 2 # 生成saver saver = tf.train.Saver() savedir = "model/" # 啟動session with tf.Session() as sess: sess.run(init) for epoch in range(training_epochs+1): sess.run(optimizer, feed_dict={input['X']: train_X, input['Y']: train_Y}) #顯示訓練中的詳細信息 if epoch % display_step == 0: loss = sess.run(cost, feed_dict={input['X']: train_X, input['Y']:train_Y}) print ("Epoch:", epoch, "cost=", loss) if not (loss == "NA" ): plotdata["batchsize"].append(epoch) plotdata["loss"].append(loss) print (" Finish") #保存模型 saver.save(sess, savedir+"mymodel.cpkt") #圖形顯示 plt.subplot(222) plt.plot(train_X, train_Y, 'ro', label='Original data') plt.plot(train_X, sess.run(z5, feed_dict={input['X']: train_X}), label='Fitted line') plt.legend() plotdata["avgloss"] = moving_average(plotdata["loss"]) plt.subplot(212) plt.plot(plotdata["batchsize"], plotdata["avgloss"], 'b--') plt.xlabel('Minibatch number') plt.ylabel('Loss') plt.title('Minibatch run vs Training loss') plt.show() #預測結果 #在另外一個session里面載入保存的模型,再測試 a=[[0.2],[0.3]] with tf.Session() as sess2: #sess2.run(tf.global_variables_initializer())可有可無,因為下面restore會載入參數,相當於本次調用的初始化 saver.restore(sess2, "model/mymodel.cpkt") print ("x=[[0.2],[0.3]],z5=", sess2.run(z5, feed_dict={input['X']: a}))
生成如下目錄:
上述代碼模型的載入沒有利用到檢查點文件,顯得不夠智能,還需用戶去查找指定某一模型,那在很多算法項目中是不需要用戶去找的,而可以通過檢查點找到保存的模型。例如:
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np import matplotlib.pyplot as plt plotdata = { "batchsize":[], "loss":[] } def moving_average(a, w=11): if len(a) < w: return a[:] return [val if idx < w else sum(a[(idx-w):idx])/w for idx, val in enumerate(a)] #生成模擬數據,二次函數關系 train_X = np.linspace(-1, 1, 100)[:, np.newaxis] train_Y = train_X*train_X + 5 * train_X + np.random.randn(*train_X.shape) * 0.3 #子圖1顯示模擬數據點 plt.figure(12) plt.subplot(221) plt.plot(train_X, train_Y, 'ro', label='Original data') plt.legend() # 創建模型 # 字典型占位符 input = {'X':tf.placeholder("float",[None,1]), 'Y':tf.placeholder("float",[None,1])} # X = tf.placeholder("float",[None,1]) # Y = tf.placeholder("float",[None,1]) # 模型參數 parameter = {'W1':tf.Variable(tf.random_normal([1,10]), name="weight1"), 'b1':tf.Variable(tf.zeros([1,10]), name="bias1"), 'W2':tf.Variable(tf.random_normal([10,6]), name="weight2"),'b2':tf.Variable(tf.zeros([1,6]), name="bias2"), 'W3':tf.Variable(tf.random_normal([6,1]), name="weight3"), 'b3':tf.Variable(tf.zeros([1]), name="bias3")} # W1 = tf.Variable(tf.random_normal([1,10]), name="weight1") # b1 = tf.Variable(tf.zeros([1,10]), name="bias1") # W2 = tf.Variable(tf.random_normal([10,6]), name="weight2") # b2 = tf.Variable(tf.zeros([1,6]), name="bias2") # W3 = tf.Variable(tf.random_normal([6,1]), name="weight3") # b3 = tf.Variable(tf.zeros([1]), name="bias3") # 前向結構 z1 = tf.matmul(input['X'], parameter['W1']) + parameter['b1'] z2 = tf.nn.relu(z1) z3 = tf.matmul(z2, parameter['W2']) + parameter['b2'] z4 = tf.nn.relu(z3) z5 = tf.matmul(z4, parameter['W3']) + parameter['b3'] #反向優化 cost =tf.reduce_mean( tf.square(input['Y'] - z5)) learning_rate = 0.01 optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) #Gradient descent # 初始化變量 init = tf.global_variables_initializer() # 訓練參數 training_epochs = 5000 display_step = 2 # 生成saver saver = tf.train.Saver(max_to_keep=1) savedir = "model/" # 啟動session with tf.Session() as sess: sess.run(init) for epoch in range(training_epochs+1): sess.run(optimizer, feed_dict={input['X']: train_X, input['Y']: train_Y}) saver.save(sess, savedir+"mymodel.cpkt",global_step=epoch) #顯示訓練中的詳細信息 if epoch % display_step == 0: loss = sess.run(cost, feed_dict={input['X']: train_X, input['Y']:train_Y}) print ("Epoch:", epoch, "cost=", loss) if not (loss == "NA" ): plotdata["batchsize"].append(epoch) plotdata["loss"].append(loss) print (" Finish") #圖形顯示 plt.subplot(222) plt.plot(train_X, train_Y, 'ro', label='Original data') plt.plot(train_X, sess.run(z5, feed_dict={input['X']: train_X}), label='Fitted line') plt.legend() plotdata["avgloss"] = moving_average(plotdata["loss"]) plt.subplot(212) plt.plot(plotdata["batchsize"], plotdata["avgloss"], 'b--') plt.xlabel('Minibatch number') plt.ylabel('Loss') plt.title('Minibatch run vs Training loss') plt.show() #預測結果 #在另外一個session里面載入保存的模型,再測試 a=[[0.2],[0.3]] load=5000 with tf.Session() as sess2: #sess2.run(tf.global_variables_initializer())可有可無,因為下面restore會載入參數,相當於本次調用的初始化 #saver.restore(sess2, "model/mymodel.cpkt") saver.restore(sess2, "model/mymodel.cpkt-" + str(load)) print ("x=[[0.2],[0.3]],z5=", sess2.run(z5, feed_dict={input['X']: a})) #通過檢查點文件載入保存的模型 with tf.Session() as sess3: ckpt = tf.train.get_checkpoint_state(savedir) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess3, ckpt.model_checkpoint_path) print ("x=[[0.2],[0.3]],z5=", sess3.run(z5, feed_dict={input['X']: a})) #通過檢查點文件載入最新保存的模型 with tf.Session() as sess4: ckpt = tf.train.latest_checkpoint(savedir) if ckpt!=None: saver.restore(sess4, ckpt) print ("x=[[0.2],[0.3]],z5=", sess4.run(z5, feed_dict={input['X']: a}))
而通常情況下,上述兩種通過檢查點載入模型參數的結果是一樣的,主要是因為不管用戶保存了多少個模型文件,都會被記錄在唯一一個檢查點文件中,這個指定保存模型個數的參數就是max_to_keep,例如:
saver = tf.train.Saver(max_to_keep=3)
而檢查點都會默認用最新的模型載入,忽略了之前的模型,因此上述兩個檢查點載入了同一個模型,自然最后輸出的測試結果是一致的。保存的三個模型如圖:
接下來,為什么上面的變量,需要給它對應的操作起個名字,而且是不一樣的名字呢?像weight1、bias1等等。大家都知道,名字這個東西太重要了,通過它可以訪問我們想訪問的變量,也就可以對其進行一些操作。例如:
- 顯示模型的內容
不同版本的函數會有些區別,本文試驗的版本是1.7.0,代碼例如:
# -*- coding: utf-8 -*- import tensorflow as tf from tensorflow.python.tools import inspect_checkpoint as chkp #顯示全部變量的名字和值 chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-5000", all_tensor_names='', tensor_name='', all_tensors=True) #顯示指定名字變量的值 chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-5000", all_tensor_names='', tensor_name='weight1', all_tensors=False) chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-5000", all_tensor_names='', tensor_name='bias1', all_tensors=False)
運行結果如下圖:
相反如果對不同變量的操作用了同一個name,系統將會自動對同名稱操作排序,例如:
# -*- coding: utf-8 -*- import tensorflow as tf from tensorflow.python.tools import inspect_checkpoint as chkp #顯示全部變量的名字和值 chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-50", all_tensor_names='', tensor_name='', all_tensors=True) #顯示指定名字變量的值 chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-50", all_tensor_names='', tensor_name='weight', all_tensors=False) chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-50", all_tensor_names='', tensor_name='bias', all_tensors=False)
結果為:
需要注意的是因為對所有同名的變量排序之后,真正的變量名已經變了,所以,當指定查看某一個變量的值時,其實輸出的是第一個變量的值,因為它的名稱還保留着不變。另外,也可以通過變量的name屬性查看其操作名。
- 按名字保存變量
可以通過指定名稱來保存變量;注意如果名字如果搞混了,名稱所對應的值也就搞混了,比如:
#只保存這兩個變量,並且這兩個被搞混了 saver = tf.train.Saver({'weight': parameter['b2'], 'bias':parameter['W1']}) # -*- coding: utf-8 -*- import tensorflow as tf from tensorflow.python.tools import inspect_checkpoint as chkp #顯示全部變量的名字和值 chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-50", all_tensor_names='', tensor_name='', all_tensors=True) #顯示指定名字變量的值 chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-50", all_tensor_names='', tensor_name='weight', all_tensors=False) chkp.print_tensors_in_checkpoint_file("model/mymodel.cpkt-50", all_tensor_names='', tensor_name='bias', all_tensors=False)
此時的結果是:
這樣,模型按照我們的想法保存了參數,注意不能搞混變量和其對應的名字。