本實驗使用了mnist.npz數據集,可以使用在線方式導入,但是我在下載過程中老是因為網絡原因被打斷,因此使用離線方式導入,離線包已傳至github方便大家下載:
https://github.com/guangfuhao/Deeplearning/blob/master/mnist.npz (mnist.npz下載)
下面是全部代碼:
#1.Import the neccessary libraries needed import numpy as np import tensorflow as tf import matplotlib from matplotlib import pyplot as plt ######################################################################## #2.Set default parameters for plots matplotlib.rcParams['font.size'] = 20 matplotlib.rcParams['figure.titlesize'] = 20 matplotlib.rcParams['figure.figsize'] = [9, 7] matplotlib.rcParams['font.family'] = ['STKaiTi'] matplotlib.rcParams['axes.unicode_minus']=False ######################################################################## #3.Initialize Parameters #Initialize learning rate lr = 1e-3 #Initialize loss array losses = [] #Initialize the weights layers and the bias layers w1=tf.Variable(tf.random.truncated_normal([784,256],stddev=0.1)) b1=tf.Variable(tf.zeros([256])) w2=tf.Variable(tf.random.truncated_normal([256,128],stddev=0.1)) b2=tf.Variable(tf.zeros([128])) w3=tf.Variable(tf.random.truncated_normal([128,10],stddev=0.1)) b3=tf.Variable(tf.zeros([10])) ######################################################################## #4.Import the minist dataset by numpy offline def load_mnist(): #define the directory where mnist.npz is(Please watch the '\'!) path = r'F:\learning\machineLearning\forward_progression\mnist.npz' f = np.load(path) x_train, y_train = f['x_train'],f['y_train'] x_test, y_test = f['x_test'],f['y_test'] f.close() return (x_train, y_train), (x_test, y_test) (train_image,train_label),_ = load_mnist() x = tf.convert_to_tensor(train_image, dtype=tf.float32) / 255. y = tf.convert_to_tensor(train_label, dtype=tf.int32) #Reshape x from [60k, 28, 28] to [60k, 28*28] x=tf.reshape(x,[-1,28*28]) ######################################################################## #5.Combine x and y as a tuple and batch them train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128) ''' #Encapsulate train_db as an iterator object train_iter = iter(train_db) sample = next(train_iter) ''' ######################################################################## #6.Iterate database for 20 times for epoch in range(20): #For every batch:x:[128, 28*28],y: [128] for step, (x, y) in enumerate(train_db): with tf.GradientTape() as tape: # tf.Variable # x: [b, 28*28] # h1 = x@w1 + b1 # [b, 784]@[784, 256] + [256] => [b, 256] + [256] => [b, 256] + [b, 256] h1 = x@w1 + tf.broadcast_to(b1, [x.shape[0], 256]) h1 = tf.nn.relu(h1) # [b, 256] => [b, 128] h2 = h1@w2 + b2 h2 = tf.nn.relu(h2) # [b, 128] => [b, 10] out = h2@w3 + b3 # y: [b] => [b, 10] y_onehot = tf.one_hot(y, depth=10) # compute loss # mse = mean(sum(y-out)^2) # [b, 10] loss = tf.square(y_onehot - out) # mean: scalar loss = tf.reduce_mean(loss) # compute gradients grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3]) #Update the weights and the bias w1.assign_sub(lr * grads[0]) b1.assign_sub(lr * grads[1]) w2.assign_sub(lr * grads[2]) b2.assign_sub(lr * grads[3]) w3.assign_sub(lr * grads[4]) b3.assign_sub(lr * grads[5]) if step % 100 == 0: print(epoch, step, 'loss:', float(loss)) losses.append(float(loss)) ######################################################################## #7.Show the change of losses via matplotlib plt.figure() plt.plot(losses, color='C0', marker='s', label='訓練') plt.xlabel('Epoch') plt.legend() plt.ylabel('MSE') #Save figure as '.svg' file #plt.savefig('forward.svg') plt.show()
第一部分沒什么好講的,導入了numpy,tensorflow,matplot和pyplot庫
import numpy as np import tensorflow as tf import matplotlib from matplotlib import pyplot as plt
第二部分設置了matplot畫圖的一些參數
pylot使用rc配置文件來自定義圖形的各種默認屬性,稱之為rc配置或rc參數。通過rc參數可以修改默認的屬性,包括窗體大小、每英寸的點數、線條寬度、顏色、樣式、坐標軸、坐標和網絡屬性、文本、字體等
font.size為字體大小,figure.titlesize為標題大小,figure.figsize為圖像顯示大小,font.family設置字體為STKaiTi顯示中文,axes.unicode_minus設置正常顯示字符
matplotlib.rcParams['font.size'] = 20 matplotlib.rcParams['figure.titlesize'] = 20 matplotlib.rcParams['figure.figsize'] = [9, 7] matplotlib.rcParams['font.family'] = ['STKaiTi'] matplotlib.rcParams['axes.unicode_minus']=False
第三部分初始化一些參數,lr為學習率(我將lr調整為1e-2時最終的losses變的更小了,但是目前並不知道這個值會對網絡的最終表現產生什么樣的影響),就是控制參數在每次梯度下降中下降的速率,losses用來存儲每次epoch結束時的loss,還用截斷正態分布(在tf.truncated_normal中如果x的取值在區間(μ-2σ,μ+2σ)之外則重新進行選擇。這樣保證了生成的值都在均值附近)初始化了三層權重層和用0初始化了偏置層
#Initialize learning rate lr = 1e-3 #Initialize loss array losses = [] #Initialize the weights layers and the bias layers w1=tf.Variable(tf.random.truncated_normal([784,256],stddev=0.1)) b1=tf.Variable(tf.zeros([256])) w2=tf.Variable(tf.random.truncated_normal([256,128],stddev=0.1)) b2=tf.Variable(tf.zeros([128])) w3=tf.Variable(tf.random.truncated_normal([128,10],stddev=0.1)) b3=tf.Variable(tf.zeros([10]))
第四部分導入了minist數據集並對x的維度做了預處理,其中path為自己本地下載的mnist.npz的位置,注意這里是右斜杠!
def load_mnist(): #define the directory where mnist.npz is(Please watch the '\'!) path = r'F:\learning\machineLearning\forward_progression\mnist.npz' f = np.load(path) x_train, y_train = f['x_train'],f['y_train'] x_test, y_test = f['x_test'],f['y_test'] f.close() return (x_train, y_train), (x_test, y_test) (train_image,train_label),_ = load_mnist() x = tf.convert_to_tensor(train_image, dtype=tf.float32) / 255. y = tf.convert_to_tensor(train_label, dtype=tf.int32) #Reshape x from [60k, 28, 28] to [60k, 28*28] x=tf.reshape(x,[-1,28*28])
第五部分將數據集做了batch切分,每個batch為128(這里的batch大小為什么是128存疑,我試過200和100但沒發現什么區別)條數據,至於什么是Batch和Epoch,可以直接向下至文末查看
train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128)
第六部分Epoch20次,用mse計算loss,下面為mse的解釋:
tf.GradientTape(梯度帶)
__init__(persistent=False,watch_accessed_variables=True)
作用:創建一個新的GradientTape
參數:
persistent: 布爾值,用來指定新創建的gradient tape是否是可持續性的。默認是False,意味着只能夠調用一次gradient()函數
watch_accessed_variables: 布爾值,表明這個gradien tap是不是會自動追蹤任何能被訓練(trainable)的變量。默認是True。要是為False的話,意味着你需要手動去指定你想追蹤的那些變量
下面的前向計算過程都需要包裹在 with tf.GradientTape() as tape 上下文中,使得前向計算時能夠保存計算圖信息,方便反向求導運算。assign_sub()將原地(In-place)減去給定的參數值,實現參數的自我更新操作
for epoch in range(20): #For every batch:x:[128, 28*28],y: [128] for step, (x, y) in enumerate(train_db): with tf.GradientTape() as tape: # tf.Variable # x: [b, 28*28] # h1 = x@w1 + b1 # [b, 784]@[784, 256] + [256] => [b, 256] + [256] => [b, 256] + [b, 256] h1 = x@w1 + tf.broadcast_to(b1, [x.shape[0], 256]) h1 = tf.nn.relu(h1) # [b, 256] => [b, 128] h2 = h1@w2 + b2 h2 = tf.nn.relu(h2) # [b, 128] => [b, 10] out = h2@w3 + b3 # y: [b] => [b, 10] y_onehot = tf.one_hot(y, depth=10) # compute loss # mse = mean(sum(y-out)^2) # [b, 10] loss = tf.square(y_onehot - out) # mean: scalar loss = tf.reduce_mean(loss) # compute gradients grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3]) #Update the weights and the bias w1.assign_sub(lr * grads[0]) b1.assign_sub(lr * grads[1]) w2.assign_sub(lr * grads[2]) b2.assign_sub(lr * grads[3]) w3.assign_sub(lr * grads[4]) b3.assign_sub(lr * grads[5]) if step % 100 == 0: print(epoch, step, 'loss:', float(loss)) losses.append(float(loss))
第七部分將losses隨訓練次數的增加的變化展示出來
plt.figure() plt.plot(losses, color='C0', marker='s', label='訓練') plt.xlabel('Epoch') plt.legend() plt.ylabel('MSE') #Save figure as '.svg' file #plt.savefig('forward.svg') plt.show()
下圖是最終的loss曲線:
Batch和Epoch通俗易懂的解釋:(參考自https://blog.csdn.net/weixin_42137700/article/details/84302045)
假設您有一個包含200個樣本(數據行)的數據集,並且您選擇的Batch大小為5和1,000個Epoch。
這意味着數據集將分為40個Batch,每個Batch有5個樣本。每批五個樣品后,模型權重將更新。
這也意味着一個epoch將涉及40個Batch或40個模型更新。
有1000個Epoch,模型將暴露或傳遞整個數據集1,000次。在整個培訓過程中,總共有40,000Batch。