隨着科研人員在使用神經網絡訓練時不斷的嘗試,為我們留下了很多有用的技巧,合理的運用這些技巧可以使自己的模型得到更好的擬合效果。
一 利用異或數據集演示過擬合
全連接網絡雖然在擬合問題上比較強大,但太強大的擬合效果也帶來了其它的麻煩,這就是過擬合問題。
首先我們看一個例子,這次將原有的4個異或帶護具擴充成了上百個具有異或特征的數據集,然后通過全連接網絡將它們進行分類。
實例描述:構建異或數據集模擬樣本,在構建一個簡單的多層神經網絡來擬合其樣本特征,觀察其出現前泥河的現象,接着通過增大網絡復雜性的方式來優化欠擬合問題,使其出現過擬合現象。
1. 構建異或數據集
''' 生成隨機數據 ''' np.random.seed(10) #特征個數 num_features = 2 #樣本個數 num_samples = 320 #n返回長度為特征的數組 正太分布 mean = np.random.randn(num_features) print('mean',mean) cov = np.eye(num_features) print('cov',cov) X,Y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 Y = Y % 2 xr = [] xb = [] for (l,k) in zip(Y[:],X[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
可以看到圖上數據分為兩類,左上和左下是一類,右上和右下是一類。
2 定義網絡模型
''' 定義變量 ''' #學習率 learning_rate = 1e-4 #輸入層節點個數 n_input = 2 #隱藏層節點個數 n_hidden = 2 #輸出節點數 n_label = 1 input_x = tf.placeholder(tf.float32,[None,n_input]) input_y = tf.placeholder(tf.float32,[None,n_label]) ''' 定義學習參數 h1 代表隱藏層 h2 代表輸出層 ''' weights = { 'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #方差0.1 'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01)) } biases = { 'h1':tf.Variable(tf.zeros([n_hidden])), 'h2':tf.Variable(tf.zeros([n_label])) } ''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y)) train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
3. 訓練網絡並可視化顯示
''' 開始訓練 ''' training_epochs = 30000 sess = tf.InteractiveSession() #初始化 sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): _,lo = sess.run([train,loss],feed_dict={input_x:X,input_y:np.reshape(Y,[-1,1])}) if epoch % 1000 == 0: print('Epoch {0} loss {1}'.format(epoch,lo)) ''' 數據可視化 ''' nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show()
可以看到,模型在迭代訓練20000次之后梯度更新就放緩了,而且loss值約等於16%並且准確率不高,所可視化的圖片也沒有將數據完全分開。
圖上這種現象就叫做欠擬合,即沒有完全擬合到想要得到的真實數據情況。
4 修正模型提高擬合度
欠擬合的原因並不是模型不行,而是我們的學習方法無法更精准地學習到適合的模型參數。模型越薄弱,對訓練的要求就越高。但是可以采用增加節點或者增加層的方式,讓模型具有更高的擬合性,從而降低模型的訓練難度。
將隱藏層的節點個數改為200,代碼如下:
#隱藏層節點個數 n_hidden = 200
從圖中可以看到強大的全連接網絡,僅僅通過一個隱藏層,使用200個神經元就可以把數據划分的那么細致。而loss值也在逐漸變小,30000次之后已經變成了0.056.
5 驗證過擬合
那么對於上面的模型好不好呢?我們再取少量的數據放到模型中驗證一下,然后用同樣的方式在坐標系中可視化。
''' 測試 可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 ''' test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 test_y = test_y % 2 xr = [] xb = [] for (l,k) in zip(test_y[:],test_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.figure() plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])}) print('Test data loss {0}'.format(lo)) nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show()
從這次運行結果,我們可以看到測試集的loss增加到了0.21,並沒有原來訓練時候的那么好0.056,模型還是原來的模型,但是這次卻只框住了少量的樣本。這種現象就是過擬合,它和欠擬合都是我們在訓練模型中不願意看到的現象,我們要的是真正的擬合在測試情況下能夠變現出訓練時的良好效果。
避免過擬合的方法有很多:常用的有early stopping、數據集擴展、正則化、棄權等,下面會使用這些方法來對該案例來進行優化。
6 通過正則化改善過擬合情況
Tensorflow中封裝了L2正則化的函數可以直接使用:
tf.nn.l2_loss(t,name=None)
函數原型如下:
def l2_loss(t, name=None): r"""L2 Loss. Computes half the L2 norm of a tensor without the `sqrt`: output = sum(t ** 2) / 2 Args: t: A `Tensor`. Must be one of the following types: `half`, `float32`, `float64`. Typically 2-D, but may have any dimensions. name: A name for the operation (optional). Returns: A `Tensor`. Has the same type as `t`. 0-D. """ result = _op_def_lib.apply_op("L2Loss", t=t, name=name) return result
但是並沒有提供L1正則化函數,但是可以自己組合:
tf.reduce_sum(tf.abs(w))
我們在代碼中加入L2正則化:並設置返回參數lamda=1.6,修改代價函數如下:
loss = tf.reduce_mean(tf.square(y_pred - input_y)) + lamda * tf.nn.l2_loss(weights['h1'])/ num_samples + tf.nn.l2_loss(weights['h2']) *lamda/ num_samples
訓練集的代價值從0.056增加到了0.106,但是測試集的代價值僅僅從0.21降到了0.0197,其效果並不是太明顯。
7 通過增大數據集改善過擬合情況
下面再試試通過增大數據集的方法來改善過度擬合的情況,這里不再生產一個隨機樣本,而是每次從循環生成1000個數據。部分代碼如下:
for epoch in range(training_epochs): train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 train_y = train_y % 2 _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])}) if epoch % 1000 == 0: print('Epoch {0} loss {1}'.format(epoch,lo))
這次測試集代價值降到了0.04,比訓練集還低,泛化效果更好了。
8 通過棄權改善過擬合情況
在TensorFlow中棄權的函數原型如下:
def dropout(x,keep_prob,noise_shape=None,seed=None,name=None)
其中的參數意義如下:
- x:輸入的模型節點
- keep_prob:保存率,如果為1,則代表全部進行學習,如果為0.8,則代表丟棄20%的節點,只讓80%的節點參與學習。
- noise_shape:指定x中,哪些維度可以使用dropout技術。
- seed:隨機選擇節點的過程中隨機數的種子值。
dropout改變了神經網絡的結構,它僅僅是屬於訓練時的方法,所以一般在進行測試時要將dropout的keep_prob設置為1,代表不需要進行丟棄,否則會影響模型的正常輸出。
程序中加入了棄權,並且把keep_prob設置為0.5.
''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) keep_prob = tf.placeholder(dtype=tf.float32) layer_1_drop = tf.nn.dropout(layer_1,keep_prob) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y))
輸出結果如下,可以看到加入棄權改進也並不是多大,和L2正則化效果差不多。
9 基於退化學習率dropout技術來擬合數據集
從上面結果可以看到代價值在來回波動,這主要是因為在訓練后期出現了抖動現象,這表明學習率有點大了,這里我們可以添加退化學習率。
在使用優化器代碼部分添加learning_rate,設置總步數為30000,每執行1000步,學習率衰減0.9,部分代碼如下:
''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) keep_prob = tf.placeholder(dtype=tf.float32) layer_1_drop = tf.nn.dropout(layer_1,keep_prob) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y)) global_step = tf.Variable(0,trainable=False) decaylearning_rate = tf.train.exponential_decay(learning_rate,global_step,1000,0.9) train = tf.train.AdamOptimizer(decaylearning_rate).minimize(loss,global_step = global_step) ''' 開始訓練 ''' training_epochs = 30000 sess = tf.InteractiveSession() #初始化 sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): #執行一次train global_step變量會自加1 rate,_,lo = sess.run([decaylearning_rate,train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5}) if epoch % 1000 == 0: print('Epoch {0} learning_rate {1} loss {2} '.format(epoch,rate,lo))
我們可以看到學習率是衰減了,但是效果並不是很明顯,代價值還是在震盪,我們可以嘗試調整一些參數,是的效果更好,這是一件需要耐心的事情。
完整代碼:

# -*- coding: utf-8 -*- """ Created on Thu Apr 26 15:02:16 2018 @author: zy """ ''' 通過一個過擬合的案例 來學習全網絡訓練中的優化技巧 比如:正則化,棄權等 ''' import tensorflow as tf import numpy as np from sklearn.utils import shuffle import matplotlib.pyplot as plt import random from matplotlib.colors import colorConverter, ListedColormap ''' 生成數據集 ''' def get_one_hot(labels,num_classes): ''' one_hot編碼 args: labels : 輸如類標簽 num_classes:類別個數 ''' m = np.zeros([labels.shape[0],num_classes]) for i in range(labels.shape[0]): m[i][labels[i]] = 1 return m def generate(sample_size,mean,cov,diff,num_classes=2,one_hot = False): ''' 因為沒有醫院的病例數據,所以模擬生成一些樣本 按照指定的均值和方差生成固定數量的樣本 args: sample_size:樣本個數 mean: 長度為 M 的 一維ndarray或者list 對應每個特征的均值 cov: N X N的ndarray或者list 協方差 對稱矩陣 diff:長度為 類別-1 的list 每i元素為第i個類別和第0個類別均值的差值 [特征1差,特征2差....] 如果長度不夠,后面每個元素值取diff最后一個元素 num_classes:分類數 one_hot : one_hot編碼 ''' #每一類的樣本數 假設有1000個樣本 分兩類,每類500個樣本 sample_per_class = int(sample_size/num_classes) ''' 多變量正態分布 mean : 1-D array_like, of length N . Mean of the N-dimensional distribution. 數組類型,每一個元素對應一維的平均值 cov : 2-D array_like, of shape (N, N) .Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling. size:shape. Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned. ''' #生成均值為mean,協方差為cov sample_per_class x len(mean)個樣本 類別為0 X0 = np.random.multivariate_normal(mean,cov,sample_per_class) Y0 = np.zeros(sample_per_class,dtype=np.int32) #對於diff長度不夠進行處理 if len(diff) != num_classes-1: tmp = np.zeros(num_classes-1) tmp[0:len(diff)] = diff tmp[len(diff):] = diff[-1] else: tmp = diff for ci,d in enumerate(tmp): ''' 把list變成 索引-元素樹,同時迭代索引和元素本身 ''' #生成均值為mean+d,協方差為cov sample_per_class x len(mean)個樣本 類別為ci+1 X1 = np.random.multivariate_normal(mean+d,cov,sample_per_class) Y1 = (ci+1)*np.ones(sample_per_class,dtype=np.int32) #合並X0,X1 按列拼接 X0 = np.concatenate((X0,X1)) Y0 = np.concatenate((Y0,Y1)) if one_hot: Y0 = get_one_hot(Y0,num_classes) #打亂順序 X,Y = shuffle(X0,Y0) return X,Y def example_overfit(): ''' 顯示一個過擬合的案例 ''' ''' 生成隨機數據 ''' np.random.seed(10) #特征個數 num_features = 2 #樣本個數 num_samples = 320 #n返回長度為特征的數組 正太分布 mean = np.random.randn(num_features) print('mean',mean) cov = np.eye(num_features) print('cov',cov) train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 train_y = train_y % 2 xr = [] xb = [] for (l,k) in zip(train_y[:],train_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') ''' 定義變量 ''' #學習率 learning_rate = 1e-4 #輸入層節點個數 n_input = 2 #隱藏層節點個數 n_hidden = 200 #設置為2則會欠擬合 #輸出節點數 n_label = 1 input_x = tf.placeholder(tf.float32,[None,n_input]) input_y = tf.placeholder(tf.float32,[None,n_label]) ''' 定義學習參數 h1 代表隱藏層 h2 代表輸出層 ''' weights = { 'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #方差0.1 'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01)) } biases = { 'h1':tf.Variable(tf.zeros([n_hidden])), 'h2':tf.Variable(tf.zeros([n_label])) } ''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y)) train = tf.train.AdamOptimizer(learning_rate).minimize(loss) ''' 開始訓練 ''' training_epochs = 30000 sess = tf.InteractiveSession() #初始化 sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])}) if epoch % 1000 == 0: print('Epoch {0} loss {1}'.format(epoch,lo)) ''' 數據可視化 ''' nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() ''' 測試 可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 ''' test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 test_y = test_y % 2 xr = [] xb = [] for (l,k) in zip(test_y[:],test_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.figure() plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])}) print('Test data loss {0}'.format(lo)) nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() def example_l2_norm(): ''' 顯利用l2范數緩解過擬合 ''' ''' 生成隨機數據 ''' np.random.seed(10) #特征個數 num_features = 2 #樣本個數 num_samples = 320 #n返回長度為特征的數組 正太分布 mean = np.random.randn(num_features) print('mean',mean) cov = np.eye(num_features) print('cov',cov) train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 train_y = train_y % 2 xr = [] xb = [] for (l,k) in zip(train_y[:],train_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') ''' 定義變量 ''' #學習率 learning_rate = 1e-4 #輸入層節點個數 n_input = 2 #隱藏層節點個數 n_hidden = 200 #設置為2則會欠擬合 #輸出節點數 n_label = 1 #規范化參數 lamda = 1.6 input_x = tf.placeholder(tf.float32,[None,n_input]) input_y = tf.placeholder(tf.float32,[None,n_label]) ''' 定義學習參數 h1 代表隱藏層 h2 代表輸出層 ''' weights = { 'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #方差0.1 'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01)) } biases = { 'h1':tf.Variable(tf.zeros([n_hidden])), 'h2':tf.Variable(tf.zeros([n_label])) } ''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y)) + lamda * tf.nn.l2_loss(weights['h1'])/ num_samples + tf.nn.l2_loss(weights['h2']) *lamda/ num_samples train = tf.train.AdamOptimizer(learning_rate).minimize(loss) ''' 開始訓練 ''' training_epochs = 30000 sess = tf.InteractiveSession() #初始化 sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])}) if epoch % 1000 == 0: print('Epoch {0} loss {1}'.format(epoch,lo)) ''' 數據可視化 ''' nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() ''' 測試 可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 ''' test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 test_y = test_y % 2 xr = [] xb = [] for (l,k) in zip(test_y[:],test_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.figure() plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])}) print('Test data loss {0}'.format(lo)) nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() def example_add_trainset(): ''' 通過增加訓練集解過擬合 ''' ''' 生成隨機數據 ''' np.random.seed(10) #特征個數 num_features = 2 #樣本個數 num_samples = 1000 #n返回長度為特征的數組 正太分布 mean = np.random.randn(num_features) print('mean',mean) cov = np.eye(num_features) print('cov',cov) ''' 定義變量 ''' #學習率 learning_rate = 1e-4 #輸入層節點個數 n_input = 2 #隱藏層節點個數 n_hidden = 200 #設置為2則會欠擬合 #輸出節點數 n_label = 1 input_x = tf.placeholder(tf.float32,[None,n_input]) input_y = tf.placeholder(tf.float32,[None,n_label]) ''' 定義學習參數 h1 代表隱藏層 h2 代表輸出層 ''' weights = { 'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #方差0.1 'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01)) } biases = { 'h1':tf.Variable(tf.zeros([n_hidden])), 'h2':tf.Variable(tf.zeros([n_label])) } ''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y)) train = tf.train.AdamOptimizer(learning_rate).minimize(loss) ''' 開始訓練 ''' training_epochs = 30000 sess = tf.InteractiveSession() #初始化 sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 train_y = train_y % 2 _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])}) if epoch % 1000 == 0: print('Epoch {0} loss {1}'.format(epoch,lo)) ''' 測試 可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 ''' test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 test_y = test_y % 2 xr = [] xb = [] for (l,k) in zip(test_y[:],test_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.figure() plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])}) print('Test data loss {0}'.format(lo)) nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() def example_dropout(): ''' 使用棄權解過擬合 ''' ''' 生成隨機數據 ''' np.random.seed(10) #特征個數 num_features = 2 #樣本個數 num_samples = 320 #n返回長度為特征的數組 正太分布 mean = np.random.randn(num_features) print('mean',mean) cov = np.eye(num_features) print('cov',cov) train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 train_y = train_y % 2 xr = [] xb = [] for (l,k) in zip(train_y[:],train_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') ''' 定義變量 ''' #學習率 learning_rate = 1e-4 #輸入層節點個數 n_input = 2 #隱藏層節點個數 n_hidden = 200 #設置為2則會欠擬合 #輸出節點數 n_label = 1 input_x = tf.placeholder(tf.float32,[None,n_input]) input_y = tf.placeholder(tf.float32,[None,n_label]) ''' 定義學習參數 h1 代表隱藏層 h2 代表輸出層 ''' weights = { 'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #方差0.1 'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01)) } biases = { 'h1':tf.Variable(tf.zeros([n_hidden])), 'h2':tf.Variable(tf.zeros([n_label])) } ''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) keep_prob = tf.placeholder(dtype=tf.float32) layer_1_drop = tf.nn.dropout(layer_1,keep_prob) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y)) train = tf.train.AdamOptimizer(learning_rate).minimize(loss) ''' 開始訓練 ''' training_epochs = 30000 sess = tf.InteractiveSession() #初始化 sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5}) if epoch % 1000 == 0: print('Epoch {0} loss {1}'.format(epoch,lo)) ''' 數據可視化 ''' nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() ''' 測試 可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 ''' test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 test_y = test_y % 2 xr = [] xb = [] for (l,k) in zip(test_y[:],test_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.figure() plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1]),keep_prob:1.0}) print('Test data loss {0}'.format(lo)) nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() def example_dropout_learningrate_decay(): ''' 使用棄權解過擬合 並使用退化學習率進行加速學習 ''' ''' 生成隨機數據 ''' np.random.seed(10) #特征個數 num_features = 2 #樣本個數 num_samples = 320 #n返回長度為特征的數組 正太分布 mean = np.random.randn(num_features) print('mean',mean) cov = np.eye(num_features) print('cov',cov) train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 train_y = train_y % 2 xr = [] xb = [] for (l,k) in zip(train_y[:],train_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') ''' 定義變量 ''' #學習率 learning_rate = 1e-4 #輸入層節點個數 n_input = 2 #隱藏層節點個數 n_hidden = 200 #設置為2則會欠擬合 #輸出節點數 n_label = 1 input_x = tf.placeholder(tf.float32,[None,n_input]) input_y = tf.placeholder(tf.float32,[None,n_label]) ''' 定義學習參數 h1 代表隱藏層 h2 代表輸出層 ''' weights = { 'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #方差0.1 'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01)) } biases = { 'h1':tf.Variable(tf.zeros([n_hidden])), 'h2':tf.Variable(tf.zeros([n_label])) } ''' 定義網絡模型 ''' #隱藏層 layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1'])) keep_prob = tf.placeholder(dtype=tf.float32) layer_1_drop = tf.nn.dropout(layer_1,keep_prob) #代價函數 y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) loss = tf.reduce_mean(tf.square(y_pred - input_y)) global_step = tf.Variable(0,trainable=False) decaylearning_rate = tf.train.exponential_decay(learning_rate,global_step,1000,0.9) train = tf.train.AdamOptimizer(decaylearning_rate).minimize(loss,global_step = global_step) ''' 開始訓練 ''' training_epochs = 30000 sess = tf.InteractiveSession() #初始化 sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): #執行一次train global_step變量會自加1 rate,_,lo = sess.run([decaylearning_rate,train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5}) if epoch % 1000 == 0: print('Epoch {0} learning_rate {1} loss {2} '.format(epoch,rate,lo)) ''' 數據可視化 ''' nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() ''' 測試 可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 ''' test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4) #轉換為二種類別 test_y = test_y % 2 xr = [] xb = [] for (l,k) in zip(test_y[:],test_x[:]): if l == 0.0: xr.append([k[0],k[1]]) else: xb.append([k[0],k[1]]) xr = np.array(xr) xb = np.array(xb) plt.figure() plt.scatter(xr[:,0],xr[:,1],c='r',marker='+') plt.scatter(xb[:,0],xb[:,1],c='b',marker='o') lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1]),keep_prob:1.0}) print('Test data loss {0}'.format(lo)) nb_of_xs = 200 xs1 = np.linspace(-1,8,num = nb_of_xs) xs2 = np.linspace(-1,8,num = nb_of_xs) #創建網格 xx,yy = np.meshgrid(xs1,xs2) #初始化和填充 classfication plane classfication_plane = np.zeros([nb_of_xs,nb_of_xs]) for i in range(nb_of_xs): for j in range(nb_of_xs): #計算每個輸入樣本對應的分類標簽 classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0}) #創建 color map用於顯示 cmap = ListedColormap([ colorConverter.to_rgba('r',alpha = 0.30), colorConverter.to_rgba('b',alpha = 0.30), ]) #顯示各個樣本邊界 plt.contourf(xx,yy,classfication_plane,cmap = cmap) plt.show() if __name__== '__main__': #example_overfit() #example_l2_norm() #example_add_trainset() #example_dropout() example_dropout_learningrate_decay()