第十八節，TensorFlow中使用批量歸一化(BN)

本文轉載自查看原文 2018-05-05 23:26 14364 tensorflow

在深度學習章節里，已經介紹了批量歸一化的概念，詳情請點擊這里：第九節，改善深層神經網絡：超參數調試、正則化以優化(下)

神經網絡在進行訓練時,主要是用來學習數據的分布規律,如果數據的訓練部分和測試部分分布不一樣,那么網絡的泛化能力會變得非常差.而且對於訓練的數據,每批分布也是不一樣的,那么網絡在迭代的過程中也要學習和適應不同的分布.這會大大降低網絡的訓練速度.此外,數據的分布對於激活函數來說也非常重要,有時數據分布范圍太大不利於利用激活函數的非線性特性，比如激活函使用Sigmoid函數時，會導致訓練時出現飽和的問題。而批量歸一化就是為了緩解這個問題提出的。在實際應用中，批量歸一化的收斂非常快，並且具有很強的泛化能力，某種情況下完全可以替代正則化和棄權。

一批量歸一化函數

歸一化算法可以描述為：

其中xi是batch_size樣本中的第i個樣本，μ_B是求得的每個特征均值，σ_B²是求得每個特征的方差。

1.TensorFlow中自帶BN函數的定義：

def batch_normalization(x,
                        mean,
                        variance,
                        offset,
                        scale,
                        variance_epsilon,
                        name=None):

各個參數的說明如下：

x：代表任意維度的輸入張量。
mean:代表樣本的均值。
variance：代表樣本的方差。
offset：代表偏移，即相加一個轉化值，也是公式中的beta。
scale：代表縮放，即乘以一個轉化值，也是公式中的gamma。
variance_epsilon：是為了避免分母為0的情況下，給分母加上的一個極小值，默認即可。
name：名稱。

要想使用這個整數，必須由另一個函數配合使用，tf.nn.moments，由它來計算均值和方差，然后就可以使用BN了。

2.tf.nn.moment()函數的定義如下：

def moments(x, axes, shift=None, name=None, keep_dims=False):

x：輸入張量。
axes：指定沿哪個軸計算平均值和方差。
shift：A `Tensor` containing the value by which to shift the data for numerical stability, or `None` in which case the true mean of the data is used as shift. A shift close to the true mean provides the most numerically stable results.
name：名稱。
keep_dims：是否保留維度，即形狀是否和輸入一樣。

有了以上兩個函數還不夠，因為如果使用tf.nn.moment()函數計算均值和方差，我們需要保存每批次訓練的均值和方差，然后求平均。為了有更好的效果，我們通常使用指數加權平均的方法來優化每次的均值和方差，於是就用到了tf.train.ExponentialMovingAverage()類，它的作用是讓上一次的值對本次的值有個衰減后的影響，從而使每次的值連起來后會相對平滑一下：詳細內容可以點擊這里：第八節，改善深層神經網絡：超參數調試、正則化以優化(中)

我們可以用一個表達式來表示這個函數的功能：

shadow_variable = decay * shadow_variable + (1 - decay) *variable

各參數說明如下：

decay：代表衰減指數，是在ExponentialMovingAverage()中指定的，一般為0.9.
variable：代表本批次樣本中的值。
等式右邊的shadow_variable：代表上次總樣本的值。
等式左邊的shadow_variable：代表本次次總樣本的值。

對於shadow_variable的理解，你可以將其人為該數值表示的是1/(1-β)次的平均值，本次樣本所占的權重為(1-decay)，上次樣本所占權重為(1-decay)decay，上上次樣本所占權重為(1-decay)decay^2，以此類推....

3.tf.train.ExponentialMovingAverage類的定義如下：

  def __init__(self, decay, num_updates=None, zero_debias=False,
               name="ExponentialMovingAverage"):
　def apply(self, var_list=None):

參數說明如下：

decay: Float. The decay to use.
num_updates: Optional count of number of updates applied to variables. actual decay rate used is: `min(decay, (1 + num_updates) / (10 + num_updates))
zero_debias: If `True`, zero debias moving-averages that are initialized with tensors.
name: String. Optional prefix name to use for the name of ops added in.
var_list: A list of Variable or Tensor objects. The variables and Tensors must be of types float16, float32, or float64.apply

通過調用apply()函數可以更新指數加權平均值。

二批量歸一化的簡單用法

上面的函數雖然參數不多，但是需要幾個函數聯合起來使用，於是TensorFlow中的layers模塊里又實現了一次BN函數，相當於把幾個函數合並到了一起，使用起來更加簡單。下面來介紹一下，使用時需要引入：

from tensorflow.contrib.layers.python.layers import batch_norm

或者直接調用tf.contrib.layers.batch_norm()，該函數的定義如下：

def batch_norm(inputs,
               decay=0.999,
               center=True,
               scale=False,
               epsilon=0.001,
               activation_fn=None,
               param_initializers=None,
               param_regularizers=None,
               updates_collections=ops.GraphKeys.UPDATE_OPS,
               is_training=True,
               reuse=None,
               variables_collections=None,
               outputs_collections=None,
               trainable=True,
               batch_weights=None,
               fused=False,
               data_format=DATA_FORMAT_NHWC,
               zero_debias_moving_mean=False,
               scope=None,
               renorm=False,
               renorm_clipping=None,
               renorm_decay=0.99):

參數說明如下：

inputs: A tensor with 2 or more dimensions, where the first dimension has `batch_size`. The normalization is over all but the last dimension if `data_format` is `NHWC` and the second dimension if `data_format` is `NCHW`.代表輸入，第一個維度為batch_size
dacay:Decay for the moving average. Reasonable values for `decay` are close to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc. Lower `decay` value (recommend trying `decay`=0.9) if model experiences reasonably good training performance but poor validation and/or test performance. Try zero_debias_moving_mean=True for improved stability.代表加權指數平均值的衰減速度，是使用了一種叫做加權指數衰減的方法更新均值和方差。一般會設置為0.9，值太小會導致均值和方差更新太快，而值太大又會導致幾乎沒有衰減，容易出現過擬合，這種情況一般需要把值調小點。
center: If True, add offset of `beta` to normalized tensor. If False, `beta` is ignored. 指定是否使用偏移beta。
scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the next layer is linear (also e.g. `nn.relu`), this can be disabled since the scaling can be done by the next layer.是否進行變換(通過乘以一個gamma進行縮放)，我們習慣在BN后面接一個線性變化，如Relu，所以scale一般都設置為Flase，因為后面有對數據的轉換處理，所以這里就不用再處理了。
epsilon: Small float added to variance to avoid dividing by zero.是為了避免分母為0的情況下，給分母加上的一個極小值，默認即可。
activation_fn: Activation function, default set to None to skip it and maintain a linear activation.激活函數，默認為None，即使用線性激活函數。
param_initializers: Optional initializers for beta, gamma, moving mean and moving variance.可選的初始化參數。
param_regularizers: Optional regularizer for beta and gamma.可選的正則化項。
updates_collections: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.其變量默認是tf.GraphKeys.UPDATE_OPS，在訓練時提供了一種內置的均值和方差更新機制，即通過圖中的tf.Graphs.UPDATE_OPS變量來更新，但它是在每次當前批次訓練完成后才更新均值和方差，這樣就導致當前數據總是使用前一次的均值和方差，沒有得到最新的更新。所以一般都會將其設置為None，讓均值和方差即時更新。這樣雖然相比默認值在性能上稍慢點，但是對模型的訓練還是有很大幫助的。
is_training: Whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into `moving_mean` and `moving_variance` using an exponential moving average with the given `decay`. When it is not in training mode then it would use the values of the `moving_mean` and the `moving_variance`.當它為True，代表是訓練過程，這時會不斷更新樣本集的均值與方差。當測試時，要設置成False，這樣就會使用訓練樣本集的均值和方差。
reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.支持共享變量，與下面的scope參數聯合使用。
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
batch_weights: An optional tensor of shape `[batch_size]`, containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.)
used: Use nn.fused_batch_norm if True, nn.batch_normalization otherwise.
data_format: A string. `NHWC` (default) and `NCHW` are supported.
zero_debias_moving_mean: Use zero_debias for moving_mean. It creates a new air of variables 'moving_mean/biased' and 'moving_mean/local_step'.
scope: Optional scope for `variable_scope`.指定變量的作用域variable_scope。
renorm: Whether to use Batch Renormalization https://arxiv.org/abs/1702.03275). This adds extra variables during raining. The inference is the same for either value of this parameter.
renorm_clipping: A dictionary that may map keys 'rmax', 'rmin', 'dmax' to scalar `Tensors` used to clip the renorm correction. The correction `(r, d)` is used as `corrected_value = normalized_value * r + d`, with `r` clipped to [rmin, rmax], and `d` to [-dmax, dmax]. Missing rmax, rmin, dmax are set to inf, 0, inf, respectively.
renorm_decay: Momentum used to update the moving means and standard deviations with renorm. Unlike `momentum`, this affects training and should be neither too small (which would add noise) nor too large (which would give stale estimates). Note that `decay` is still applied to get the means and variances for inference.

三為CIFAR10分類添加BN

這里繼續在第十三節cifar10分類的代碼基礎上修改：

1.添加BN函數

def avg_pool_6x6(x):
    '''
    全局平均池化層，使用一個與原有輸入同樣尺寸的filter進行池化，'SAME'填充方式  池化層后
         out_height = in_hight / strides_height（向上取整）
         out_width = in_width / strides_width（向上取整）
    
    args；
        x:輸入圖像 形狀為[batch,in_height,in_width,in_channels] 
    '''
    return tf.nn.avg_pool(x,ksize=[1,6,6,1],strides=[1,6,6,1],padding='SAME')


def batch_norm_layer(value,is_training=False,name='batch_norm'):
    '''
    批量歸一化  返回批量歸一化的結果
    
    args:
        value:代表輸入，第一個維度為batch_size
        is_training:當它為True，代表是訓練過程，這時會不斷更新樣本集的均值與方差。當測試時，要設置成False，這樣就會使用訓練樣本集的均值和方差。
              默認測試模式
        name：名稱。
    '''
    if is_training is True:
        #訓練模式 使用指數加權函數不斷更新均值和方差
        return tf.contrib.layers.batch_norm(inputs=value,decay=0.9,updates_collections=None,is_training = True)
    else:
        #測試模式 不更新均值和方差，直接使用
        return tf.contrib.layers.batch_norm(inputs=value,decay=0.9,updates_collections=None,is_training = False)

2.為BN函數添加占位符參數

由於BN函數里需要設置是否為訓練狀態，所以需要定義一個占位符來傳入我們處於訓練模式還是測試模式。

#定義占位符
input_x = tf.placeholder(dtype=tf.float32,shape=[None,24,24,3])   #圖像大小24x24x
input_y = tf.placeholder(dtype=tf.float32,shape=[None,10])        #0-9類別  
is_training = tf.placeholder(dtype=tf.bool)                       #設置為True，表示訓練 Flase表示測試

3。修改網絡結構添加BN層

在第一層h_conv1和第二層h_conv2激活函數前添加BN層。

x_image = tf.reshape(input_x,[batch_size,24,24,3])

#1.卷積層 ->池化層 在激活函數之前追加BN層
W_conv1 = weight_variable([5,5,3,64])
b_conv1 = bias_variable([64])


h_conv1 = tf.nn.relu(batch_norm_layer(conv2d(x_image,W_conv1) + b_conv1,is_training=is_training))     #輸出為[-1,24,24,64]
print_op_shape(h_conv1)
h_pool1 = max_pool_2x2(h_conv1)                                                                       #輸出為[-1,12,12,64]
print_op_shape(h_pool1)


#2.卷積層 ->池化層   在激活函數之前追加BN層
W_conv2 = weight_variable([5,5,64,64])
b_conv2 = bias_variable([64])


h_conv2 = tf.nn.relu(batch_norm_layer(conv2d(h_pool1,W_conv2) + b_conv2,is_training=is_training))    #輸出為[-1,12,12,64]
print_op_shape(h_conv2)
h_pool2 = max_pool_2x2(h_conv2)                                              #輸出為[-1,6,6,64]
print_op_shape(h_pool2)



#3.卷積層 ->全局平均池化層
W_conv3 = weight_variable([5,5,64,10])
b_conv3 = bias_variable([10])

h_conv3 = tf.nn.relu(conv2d(h_pool2,W_conv3) + b_conv3)   #輸出為[-1,6,6,10]
print_op_shape(h_conv3)

nt_hpool3 = avg_pool_6x6(h_conv3)                         #輸出為[-1,1,1,10]
print_op_shape(nt_hpool3)
nt_hpool3_flat = tf.reshape(nt_hpool3,[-1,10])            

y_conv = tf.nn.softmax(nt_hpool3_flat)

4 加入退化學習率

將原來的學習率改為退化學習率，讓其每1000步衰退0.9.

#softmax交叉熵代價函數
cost = tf.reduce_mean(-tf.reduce_sum(input_y * tf.log(y_conv),axis=1))

#加入退化學習率 初始值為learning_rate,讓其每1000步，衰減0.9  學習率 = learning_rate*0.9^(global_step/1000)
global_step = tf.Variable(0,trainable=False)
decaylearning_rate = tf.train.exponential_decay(learning_rate,global_step,1000,0.9)

#求解器    執行一次train global_step變量會自加1
train = tf.train.AdamOptimizer(learning_rate).minimize(cost,global_step = global_step)

#返回一個准確度的數據
correct_prediction = tf.equal(tf.arg_max(y_conv,1),tf.arg_max(input_y,1))
#准確率
accuracy = tf.reduce_mean(tf.cast(correct_prediction,dtype=tf.float32))

5 開始訓練

這這里我們需要給BN函數傳入一個is_training參數，表明當前是在訓練模式還是處於測試模式。

'''
四 開始訓練
'''
sess = tf.Session();
sess.run(tf.global_variables_initializer())
# 啟動計算圖中所有的隊列線程 調用tf.train.start_queue_runners來將文件名填充到隊列，否則read操作會被阻塞到文件名隊列中有值為止。
tf.train.start_queue_runners(sess=sess)

for step in range(training_step):
    #獲取batch_size大小數據集
    image_batch,label_batch = sess.run([images_train,labels_train])
    
    #one hot編碼
    label_b = np.eye(10,dtype=np.float32)[label_batch]
    
    #開始訓練      執行一次train global_step變量會自加1，這樣decaylearning_rate值就會改變    
    rate,_ = sess.run([decaylearning_rate,train],feed_dict={input_x:image_batch,input_y:label_b,is_training:True})
    
    if step % display_step == 0:
        train_accuracy = sess.run(accuracy,feed_dict={input_x:image_batch,input_y:label_b,is_training:False})
        print('Step {0} tranining accuracy {1}'.format(step,train_accuracy))

image_batch, label_batch = sess.run([images_test, labels_test])
label_b = np.eye(10,dtype=np.float32)[label_batch]
test_accuracy = sess.run(accuracy,feed_dict={input_x:image_batch,input_y:label_b,is_training:False})
print('finished！ test accuracy {0}'.format(test_accuracy))

可以看到准確率有了很大的提升，比第十三節的70%提升了很多，訓練時的准確率已經達到了80%左右，然而測試時模型的准確率下降了不少。除此之外，我們可以使用cifar10_input中的distorted_inputs()函數來增大數據集，或者采用一些過擬合的方法繼續優化。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 批量歸一化（BN, Batch Normalization） TensorFlow——批量歸一化操作深度學習面試題21：批量歸一化(Batch Normalization,BN) 深度學習面試題21：批量歸一化(Batch Normalization,BN) 深度學習歸一化：BN、GN與FRN BatchNormalization批量歸一化歸一化方法總結 | 又名“BN和它的后浪們“ 深度學習中的五種歸一化（BN、LN、IN、GN和SN) Batch Normalization：批量歸一化 tensorflow-2.0 技巧 | ImageNet 歸一化