第十一節，全連接網絡中的優化技巧-過擬合、正則化,dropout、退化學習率等

本文轉載自查看原文 2018-04-26 21:54 2929 tensorflow

隨着科研人員在使用神經網絡訓練時不斷的嘗試，為我們留下了很多有用的技巧，合理的運用這些技巧可以使自己的模型得到更好的擬合效果。

一利用異或數據集演示過擬合

全連接網絡雖然在擬合問題上比較強大，但太強大的擬合效果也帶來了其它的麻煩，這就是過擬合問題。

首先我們看一個例子，這次將原有的4個異或帶護具擴充成了上百個具有異或特征的數據集，然后通過全連接網絡將它們進行分類。

實例描述：構建異或數據集模擬樣本，在構建一個簡單的多層神經網絡來擬合其樣本特征，觀察其出現前泥河的現象，接着通過增大網絡復雜性的方式來優化欠擬合問題，使其出現過擬合現象。

1. 構建異或數據集

'''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    X,Y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    Y = Y % 2        
    
    xr = []
    xb = []
    
    for (l,k) in zip(Y[:],X[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

可以看到圖上數據分為兩類，左上和左下是一類，右上和右下是一類。

2 定義網絡模型

'''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 2
    #輸出節點數
    n_label = 1
    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)

3. 訓練網絡並可視化顯示

  '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:X,input_y:np.reshape(Y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()

可以看到，模型在迭代訓練20000次之后梯度更新就放緩了，而且loss值約等於16%並且准確率不高，所可視化的圖片也沒有將數據完全分開。

圖上這種現象就叫做欠擬合，即沒有完全擬合到想要得到的真實數據情況。

4 修正模型提高擬合度

欠擬合的原因並不是模型不行，而是我們的學習方法無法更精准地學習到適合的模型參數。模型越薄弱，對訓練的要求就越高。但是可以采用增加節點或者增加層的方式，讓模型具有更高的擬合性，從而降低模型的訓練難度。

將隱藏層的節點個數改為200，代碼如下：

    #隱藏層節點個數
    n_hidden = 200

從圖中可以看到強大的全連接網絡，僅僅通過一個隱藏層，使用200個神經元就可以把數據划分的那么細致。而loss值也在逐漸變小，30000次之后已經變成了0.056.

5 驗證過擬合

那么對於上面的模型好不好呢？我們再取少量的數據放到模型中驗證一下，然后用同樣的方式在坐標系中可視化。

'''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()

從這次運行結果，我們可以看到測試集的loss增加到了0.21，並沒有原來訓練時候的那么好0.056，模型還是原來的模型，但是這次卻只框住了少量的樣本。這種現象就是過擬合，它和欠擬合都是我們在訓練模型中不願意看到的現象，我們要的是真正的擬合在測試情況下能夠變現出訓練時的良好效果。

避免過擬合的方法有很多：常用的有early stopping、數據集擴展、正則化、棄權等，下面會使用這些方法來對該案例來進行優化。

6 通過正則化改善過擬合情況

Tensorflow中封裝了L2正則化的函數可以直接使用：

tf.nn.l2_loss(t,name=None)

函數原型如下：

def l2_loss(t, name=None):
  r"""L2 Loss.

  Computes half the L2 norm of a tensor without the `sqrt`:

      output = sum(t ** 2) / 2

  Args:
    t: A `Tensor`. Must be one of the following types: `half`, `float32`, `float64`.
      Typically 2-D, but may have any dimensions.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `t`. 0-D.
  """
  result = _op_def_lib.apply_op("L2Loss", t=t, name=name)
  return result

但是並沒有提供L1正則化函數，但是可以自己組合：

tf.reduce_sum(tf.abs(w))

我們在代碼中加入L2正則化：並設置返回參數lamda=1.6，修改代價函數如下：

loss = tf.reduce_mean(tf.square(y_pred - input_y)) + lamda * tf.nn.l2_loss(weights['h1'])/ num_samples + tf.nn.l2_loss(weights['h2']) *lamda/ num_samples

訓練集的代價值從0.056增加到了0.106，但是測試集的代價值僅僅從0.21降到了0.0197，其效果並不是太明顯。

7 通過增大數據集改善過擬合情況

下面再試試通過增大數據集的方法來改善過度擬合的情況，這里不再生產一個隨機樣本，而是每次從循環生成1000個數據。部分代碼如下：

 for epoch in range(training_epochs):
        train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
        #轉換為二種類別
        train_y = train_y % 2      
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))

這次測試集代價值降到了0.04，比訓練集還低，泛化效果更好了。

8 通過棄權改善過擬合情況

在TensorFlow中棄權的函數原型如下：

def dropout(x,keep_prob,noise_shape=None,seed=None,name=None)

其中的參數意義如下：

x：輸入的模型節點
keep_prob：保存率，如果為1，則代表全部進行學習，如果為0.8,則代表丟棄20%的節點，只讓80%的節點參與學習。
noise_shape：指定x中，哪些維度可以使用dropout技術。
seed：隨機選擇節點的過程中隨機數的種子值。

dropout改變了神經網絡的結構，它僅僅是屬於訓練時的方法，所以一般在進行測試時要將dropout的keep_prob設置為1，代表不需要進行丟棄，否則會影響模型的正常輸出。

程序中加入了棄權，並且把keep_prob設置為0.5.

'''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))

輸出結果如下，可以看到加入棄權改進也並不是多大，和L2正則化效果差不多。

9 基於退化學習率dropout技術來擬合數據集

從上面結果可以看到代價值在來回波動，這主要是因為在訓練后期出現了抖動現象，這表明學習率有點大了，這里我們可以添加退化學習率。

在使用優化器代碼部分添加learning_rate，設置總步數為30000，每執行1000步，學習率衰減0.9，部分代碼如下：

'''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    global_step = tf.Variable(0,trainable=False)
    decaylearning_rate = tf.train.exponential_decay(learning_rate,global_step,1000,0.9)
    train = tf.train.AdamOptimizer(decaylearning_rate).minimize(loss,global_step = global_step)

    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):        
        #執行一次train global_step變量會自加1
        rate,_,lo = sess.run([decaylearning_rate,train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5})
        if epoch % 1000 == 0:
            print('Epoch {0}   learning_rate  {1} loss {2} '.format(epoch,rate,lo))

我們可以看到學習率是衰減了，但是效果並不是很明顯，代價值還是在震盪，我們可以嘗試調整一些參數，是的效果更好，這是一件需要耐心的事情。

完整代碼：

# -*- coding: utf-8 -*-
"""
Created on Thu Apr 26 15:02:16 2018

@author: zy
"""

'''
通過一個過擬合的案例 來學習全網絡訓練中的優化技巧 比如：正則化，棄權等
'''

import  tensorflow  as tf
import  numpy as np
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
import random
from matplotlib.colors import colorConverter, ListedColormap 

'''
生成數據集
'''

def get_one_hot(labels,num_classes):
    '''
    one_hot編碼
    
    args:
        labels : 輸如類標簽
        num_classes：類別個數
    '''    
    m = np.zeros([labels.shape[0],num_classes])
    for i in range(labels.shape[0]):  
        m[i][labels[i]] = 1
    return m
        
    

def  generate(sample_size,mean,cov,diff,num_classes=2,one_hot = False):
    '''
    因為沒有醫院的病例數據，所以模擬生成一些樣本
    按照指定的均值和方差生成固定數量的樣本
    
    args:
        sample_size:樣本個數
        mean: 長度為 M 的 一維ndarray或者list  對應每個特征的均值
        cov： N X N的ndarray或者list  協方差  對稱矩陣
        diff:長度為 類別-1 的list    每i元素為第i個類別和第0個類別均值的差值 [特征1差，特征2差....]  如果長度不夠，后面每個元素值取diff最后一個元素
        num_classes：分類數
        one_hot : one_hot編碼
    '''
        
    #每一類的樣本數 假設有1000個樣本 分兩類，每類500個樣本    
    sample_per_class = int(sample_size/num_classes)
    
    
    '''
    多變量正態分布
    mean : 1-D array_like, of length N . Mean of the N-dimensional distribution.    數組類型，每一個元素對應一維的平均值
    cov : 2-D array_like, of shape (N, N) .Covariance matrix of the distribution. It must be symmetric and positive-semidefinite 
        for proper sampling.
    size:shape. Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement.
        Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is 
        returned.
    '''
    #生成均值為mean,協方差為cov sample_per_class x len(mean)個樣本 類別為0
    X0 = np.random.multivariate_normal(mean,cov,sample_per_class)    
    Y0 = np.zeros(sample_per_class,dtype=np.int32)
    
    
    #對於diff長度不夠進行處理
    if len(diff) != num_classes-1:
        tmp = np.zeros(num_classes-1)
        tmp[0:len(diff)] = diff  
        tmp[len(diff):] = diff[-1]
    else:
        tmp = diff
    
    
    for ci,d  in enumerate(tmp):
        '''
        把list變成 索引-元素樹，同時迭代索引和元素本身
        '''
        
        #生成均值為mean+d,協方差為cov sample_per_class x len(mean)個樣本 類別為ci+1
        X1 = np.random.multivariate_normal(mean+d,cov,sample_per_class)                
        Y1 = (ci+1)*np.ones(sample_per_class,dtype=np.int32)
                
        #合並X0,X1  按列拼接
        X0 = np.concatenate((X0,X1))
        Y0 = np.concatenate((Y0,Y1))
        
        
    if one_hot:           
        Y0 = get_one_hot(Y0,num_classes)
        
    #打亂順序
    X,Y  =  shuffle(X0,Y0)
    
    return X,Y


def example_overfit():
    '''
    顯示一個過擬合的案例
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1
    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()




def example_l2_norm():
    '''
    顯利用l2范數緩解過擬合
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1
    #規范化參數
    lamda = 1.6
    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y)) + lamda * tf.nn.l2_loss(weights['h1'])/ num_samples + tf.nn.l2_loss(weights['h2']) *lamda/ num_samples
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()


def example_add_trainset():
    '''
    通過增加訓練集解過擬合
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 1000
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    
    


    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1

    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y)) 
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
        #轉換為二種類別
        train_y = train_y % 2      
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1])})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1])})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]]})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()



def example_dropout():
    '''
    使用棄權解過擬合
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1

    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):
        _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5})
        if epoch % 1000 == 0:
            print('Epoch {0}  loss {1}'.format(epoch,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1]),keep_prob:1.0})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()


def example_dropout_learningrate_decay():
    '''
    使用棄權解過擬合 並使用退化學習率進行加速學習
    '''
    
    '''
    生成隨機數據
    '''
    np.random.seed(10)
    #特征個數
    num_features = 2
    #樣本個數
    num_samples = 320
    #n返回長度為特征的數組 正太分布
    mean = np.random.randn(num_features)
    print('mean',mean)
    cov = np.eye(num_features)
    print('cov',cov)
    train_x,train_y = generate(num_samples,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    train_y = train_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(train_y[:],train_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')

    
    
    
    '''
    定義變量
    '''
    #學習率
    learning_rate = 1e-4
    #輸入層節點個數
    n_input = 2
    #隱藏層節點個數
    n_hidden = 200   #設置為2則會欠擬合
    #輸出節點數
    n_label = 1

    
    input_x = tf.placeholder(tf.float32,[None,n_input])
    input_y = tf.placeholder(tf.float32,[None,n_label])
    
    '''
    定義學習參數
    
    h1 代表隱藏層
    h2 代表輸出層
    '''
    weights = {
            'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)),     #方差0.1
            'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
            }
    
    
    biases = {
            'h1':tf.Variable(tf.zeros([n_hidden])),    
            'h2':tf.Variable(tf.zeros([n_label]))
            }
    
    
    '''
    定義網絡模型
    '''
    #隱藏層
    layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))
    keep_prob = tf.placeholder(dtype=tf.float32)
    layer_1_drop = tf.nn.dropout(layer_1,keep_prob)
    
    #代價函數
    y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1_drop, weights['h2']),biases['h2'])) 
    loss = tf.reduce_mean(tf.square(y_pred - input_y))
    
        
    global_step = tf.Variable(0,trainable=False)
    decaylearning_rate = tf.train.exponential_decay(learning_rate,global_step,1000,0.9)
    train = tf.train.AdamOptimizer(decaylearning_rate).minimize(loss,global_step = global_step)

    
    '''
    開始訓練
    '''
    training_epochs = 30000
    sess = tf.InteractiveSession()
    
    #初始化
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(training_epochs):        
        #執行一次train global_step變量會自加1
        rate,_,lo = sess.run([decaylearning_rate,train,loss],feed_dict={input_x:train_x,input_y:np.reshape(train_y,[-1,1]),keep_prob:0.5})
        if epoch % 1000 == 0:
            print('Epoch {0}   learning_rate  {1} loss {2} '.format(epoch,rate,lo))
        
        
    '''
    數據可視化
    '''      
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()
    
    
    
    '''
    測試  可以看到測試集loss值和訓練集loss差距較大 這是因為模型過擬合了 
    '''
    test_x,test_y = generate(12,mean,cov,[[3.0,0.0],[3.0,3.0],[0.0,3.0]],num_classes=4)
    #轉換為二種類別
    test_y = test_y % 2      


    
    xr = []
    xb = []
    
    for (l,k) in zip(test_y[:],test_x[:]):
        if l == 0.0:
            xr.append([k[0],k[1]])
        else:
            xb.append([k[0],k[1]])
        
    xr = np.array(xr)
    xb = np.array(xb)
    plt.figure()
    plt.scatter(xr[:,0],xr[:,1],c='r',marker='+')
    plt.scatter(xb[:,0],xb[:,1],c='b',marker='o')
    
    lo = sess.run(loss,feed_dict={input_x:test_x,input_y:np.reshape(test_y,[-1,1]),keep_prob:1.0})
    print('Test data  loss {0}'.format(lo))
        
    nb_of_xs = 200
    xs1 = np.linspace(-1,8,num = nb_of_xs)
    xs2 = np.linspace(-1,8,num = nb_of_xs)
    
    #創建網格
    xx,yy = np.meshgrid(xs1,xs2)
    
    #初始化和填充 classfication plane
    classfication_plane = np.zeros([nb_of_xs,nb_of_xs])
    
    for i in range(nb_of_xs):
        for j in range(nb_of_xs):
            #計算每個輸入樣本對應的分類標簽
            classfication_plane[i,j] = sess.run(y_pred,feed_dict={input_x:[[xx[i,j],yy[i,j]]],keep_prob:1.0})
            
    #創建 color map用於顯示
    cmap = ListedColormap([
                colorConverter.to_rgba('r',alpha = 0.30),
                colorConverter.to_rgba('b',alpha = 0.30),      
            ])
    #顯示各個樣本邊界
    plt.contourf(xx,yy,classfication_plane,cmap = cmap)
    plt.show()


if __name__== '__main__':
    #example_overfit()
    #example_l2_norm()
    #example_add_trainset()
    #example_dropout()
    example_dropout_learningrate_decay()

View Code

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【Keras】減少過擬合的秘訣——Dropout正則化吳恩達深度學習筆記（十一）—— dropout正則化 1-6 dropout 正則化 1.6 dropout正則化 9、改善深層神經網絡之正則化、Dropout正則化正則化如何防止過擬合（四） Keras Dropout和正則化的使用欠擬合，過擬合及正則化機器學習基礎---過擬合問題及正則化技術 TensorFlow——dropout和正則化的相關方法