第十一節,全連接網絡中的優化技巧-過擬合、正則化,dropout、退化學習率等


 隨着科研人員在使用神經網絡訓練時不斷的嘗試,為我們留下了很多有用的技巧,合理的運用這些技巧可以使自己的模型得到更好的擬合效果。

一 利用異或數據集演示過擬合

全連接網絡雖然在擬合問題上比較強大,但太強大的擬合效果也帶來了其它的麻煩,這就是過擬合問題。

首先我們看一個例子,這次將原有的4個異或帶護具擴充成了上百個具有異或特征的數據集,然后通過全連接網絡將它們進行分類。

實例描述:構建異或數據集模擬樣本,在構建一個簡單的多層神經網絡來擬合其樣本特征,觀察其出現前泥河的現象,接着通過增大網絡復雜性的方式來優化欠擬合問題,使其出現過擬合現象。

1. 構建異或數據集

'''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    X,Y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    Y = Y % 2        
    
    xr = []
    xb = []
    
    for (l,k) in zip(Y[:],X[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

可以看到圖上數據分為兩類,左上和左下是一類,右上和右下是一類。

2 定義網絡模型

'''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 2
    #輸出節點數
    n_label = 1
    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)

3. 訓練網絡並可視化顯示

  '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:X,input_y:np.reshape(Y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()

  

可以看到,模型在迭代訓練20000次之后梯度更新就放緩了,而且loss值約等於16%並且准確率不高,所可視化的圖片也沒有將數據完全分開。

圖上這種現象就叫做欠擬合,即沒有完全擬合到想要得到的真實數據情況。

4 修正模型提高擬合度

欠擬合的原因並不是模型不行,而是我們的學習方法無法更精准地學習到適合的模型參數。模型越薄弱,對訓練的要求就越高。但是可以采用增加節點或者增加層的方式,讓模型具有更高的擬合性,從而降低模型的訓練難度。

將隱藏層的節點個數改為200,代碼如下:

    #隱藏層節點個數
    n_hidden = 200

從圖中可以看到強大的全連接網絡,僅僅通過一個隱藏層,使用200個神經元就可以把數據划分的那么細致。而loss值也在逐漸變小,30000次之后已經變成了0.056.

5 驗證過擬合

那么對於上面的模型好不好呢?我們再取少量的數據放到模型中驗證一下,然后用同樣的方式在坐標系中可視化。

'''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()

從這次運行結果,我們可以看到測試集的loss增加到了0.21,並沒有原來訓練時候的那么好0.056,模型還是原來的模型,但是這次卻只框住了少量的樣本。這種現象就是過擬合,它和欠擬合都是我們在訓練模型中不願意看到的現象,我們要的是真正的擬合在測試情況下能夠變現出訓練時的良好效果。

避免過擬合的方法有很多:常用的有early stopping、數據集擴展、正則化、棄權等,下面會使用這些方法來對該案例來進行優化。

6 通過正則化改善過擬合情況

 Tensorflow中封裝了L2正則化的函數可以直接使用:

tf.nn.l2_loss(t,name=None)

函數原型如下:

def l2_loss(t, name=None):
  r"""L2 Loss.

  Computes half the L2 norm of a tensor without the `sqrt`:

      output = sum(t ** 2) / 2

  Args:
    t: A `Tensor`. Must be one of the following types: `half`, `float32`, `float64`.
      Typically 2-D, but may have any dimensions.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `t`. 0-D.
  """
  result = _op_def_lib.apply_op("L2Loss", t=t, name=name)
  return result

但是並沒有提供L1正則化函數,但是可以自己組合:

tf.reduce_sum(tf.abs(w))

我們在代碼中加入L2正則化:並設置返回參數lamda=1.6,修改代價函數如下:

loss = tf.reduce_mean(tf.square(y_pred - input_y)) + lamda * tf.nn.l2_loss(weights['h1'])/ num_samples + tf.nn.l2_loss(weights['h2']) *lamda/ num_samples

  

訓練集的代價值從0.056增加到了0.106,但是測試集的代價值僅僅從0.21降到了0.0197,其效果並不是太明顯。

7 通過增大數據集改善過擬合情況

下面再試試通過增大數據集的方法來改善過度擬合的情況,這里不再生產一個隨機樣本,而是每次從循環生成1000個數據。部分代碼如下:

 for epoch in range(training_epochs):
        train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
        #轉換為二種類別
        train_y = train_y % 2      
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))

  

這次測試集代價值降到了0.04,比訓練集還低,泛化效果更好了。

8 通過棄權改善過擬合情況

在TensorFlow中棄權的函數原型如下:

def dropout(x,keep_prob,noise_shape=None,seed=None,name=None)

其中的參數意義如下:

  • x:輸入的模型節點
  • keep_prob:保存率,如果為1,則代表全部進行學習,如果為0.8,則代表丟棄20%的節點,只讓80%的節點參與學習。
  • noise_shape:指定x中,哪些維度可以使用dropout技術。
  • seed:隨機選擇節點的過程中隨機數的種子值。

dropout改變了神經網絡的結構,它僅僅是屬於訓練時的方法,所以一般在進行測試時要將dropout的keep_prob設置為1,代表不需要進行丟棄,否則會影響模型的正常輸出。

程序中加入了棄權,並且把keep_prob設置為0.5.

'''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))

輸出結果如下,可以看到加入棄權改進也並不是多大,和L2正則化效果差不多。

9 基於退化學習率dropout技術來擬合數據集

從上面結果可以看到代價值在來回波動,這主要是因為在訓練后期出現了抖動現象,這表明學習率有點大了,這里我們可以添加退化學習率。

在使用優化器代碼部分添加learning_rate,設置總步數為30000,每執行1000步,學習率衰減0.9,部分代碼如下:

'''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    global_step = tf.Variable(0,trainable=False)
    decaylearning_rate = tf.train.exponential_decay(learning_rate,global_step,1000,0.9)
    train = tf.train.AdamOptimizer(decaylearning_rate).minimize(loss,global_step = global_step)

    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):        
        #執行一次train global_step變量會自加1
        rate,_,lo = sess.run([decaylearning_rate,train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5})
        if epoch % 1000 == 0:
            print('Epoch {0}   learning_rate  {1} loss {2} '.format(epoch,rate,lo))

我們可以看到學習率是衰減了,但是效果並不是很明顯,代價值還是在震盪,我們可以嘗試調整一些參數,是的效果更好,這是一件需要耐心的事情。

完整代碼:

# -*- coding: utf-8 -*-
"""
Created on Thu Apr 26 15:02:16 2018

@author: zy
"""

'''
通過一個過擬合的案例 來學習全網絡訓練中的優化技巧 比如:正則化,棄權等
'''

import  tensorflow  as tf
import  numpy as np
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
import random
from matplotlib.colors import colorConverter, ListedColormap 

'''
生成數據集
'''

def get_one_hot(labels,num_classes):
    '''
    one_hot編碼
    
    args:
        labels : 輸如類標簽
        num_classes:類別個數
    '''    
    m = np.zeros([labels.shape[0],num_classes])
    for i in range(labels.shape[0]):  
        m[i][labels[i]] = 1
    return m
        
    

def  generate(sample_size,mean,cov,diff,num_classes=2,one_hot = False):
    '''
    因為沒有醫院的病例數據,所以模擬生成一些樣本
    按照指定的均值和方差生成固定數量的樣本
    
    args:
        sample_size:樣本個數
        mean: 長度為 M 的 一維ndarray或者list  對應每個特征的均值
        cov: N X N的ndarray或者list  協方差  對稱矩陣
        diff:長度為 類別-1 的list    每i元素為第i個類別和第0個類別均值的差值 [特征1差,特征2差....]  如果長度不夠,后面每個元素值取diff最后一個元素
        num_classes:分類數
        one_hot : one_hot編碼
    '''
        
    #每一類的樣本數 假設有1000個樣本 分兩類,每類500個樣本    
    sample_per_class = int(sample_size/num_classes)
    
    
    '''
    多變量正態分布
    mean : 1-D array_like, of length N . Mean of the N-dimensional distribution.    數組類型,每一個元素對應一維的平均值
    cov : 2-D array_like, of shape (N, N) .Covariance matrix of the distribution. It must be symmetric and positive-semidefinite 
        for proper sampling.
    size:shape. Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement.
        Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is 
        returned.
    '''
    #生成均值為mean,協方差為cov sample_per_class x len(mean)個樣本 類別為0
    X0 = np.random.multivariate_normal(mean,cov,sample_per_class)    
    Y0 = np.zeros(sample_per_class,dtype=np.int32)
    
    
    #對於diff長度不夠進行處理
    if len(diff) != num_classes-1:
        tmp = np.zeros(num_classes-1)
        tmp[0:len(diff)] = diff  
        tmp[len(diff):] = diff[-1]
    else:
        tmp = diff
    
    
    for ci,d  in enumerate(tmp):
        '''
        把list變成 索引-元素樹,同時迭代索引和元素本身
        '''
        
        #生成均值為mean+d,協方差為cov sample_per_class x len(mean)個樣本 類別為ci+1
        X1 = np.random.multivariate_normal(mean+d,cov,sample_per_class)                
        Y1 = (ci+1)*np.ones(sample_per_class,dtype=np.int32)
                
        #合並X0,X1  按列拼接
        X0 = np.concatenate((X0,X1))
        Y0 = np.concatenate((Y0,Y1))
        
        
    if one_hot:           
        Y0 = get_one_hot(Y0,num_classes)
        
    #打亂順序
    X,Y  =  shuffle(X0,Y0)
    
    return X,Y


def example_overfit():
    '''
    顯示一個過擬合的案例
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1
    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()




def example_l2_norm():
    '''
    顯利用l2范數緩解過擬合
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1
    #規范化參數
    lamda = 1.6
    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y)) + lamda * tf.nn.l2_loss(weights['h1'])/ num_samples + tf.nn.l2_loss(weights['h2']) *lamda/ num_samples
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()


def example_add_trainset():
    '''
    通過增加訓練集解過擬合
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 1000
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    
    


    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1

    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y)) 
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
        #轉換為二種類別
        train_y = train_y % 2      
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()



def example_dropout():
    '''
    使用棄權解過擬合
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1

    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1]),keep_prob:1.0})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()


def example_dropout_learningrate_decay():
    '''
    使用棄權解過擬合 並使用退化學習率進行加速學習
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1

    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    global_step = tf.Variable(0,trainable=False)
    decaylearning_rate = tf.train.exponential_decay(learning_rate,global_step,1000,0.9)
    train = tf.train.AdamOptimizer(decaylearning_rate).minimize(loss,global_step = global_step)

    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):        
        #執行一次train global_step變量會自加1
        rate,_,lo = sess.run([decaylearning_rate,train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5})
        if epoch % 1000 == 0:
            print('Epoch {0}   learning_rate  {1} loss {2} '.format(epoch,rate,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1]),keep_prob:1.0})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()


if __name__== '__main__':
    #example_overfit()
    #example_l2_norm()
    #example_add_trainset()
    #example_dropout()
    example_dropout_learningrate_decay()
View Code


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM