深度學習—BN的理解（二）

本文轉載自查看原文 2018-05-28 10:43 3067 深度學習

　　神經網絡各個操作層的順序：

1、sigmoid,tanh函數：conv -> bn -> sigmoid -> pooling

2、RELU激活函數：conv -> bn -> relu -> pooling

一般情況下，先激活函數，后pooling。但對於RELU激活函數，二者交換位置無區別。

論文原文里面是“weights -> batchnorm -> activation ->maxpooling-> weights -> batchnorm -> activation -> dropout”。原文認為這樣可以利用到激活函數的不同區間（sigmoid的兩個飽和區、中間的線性區等）實現不同的非線性效果，在特定的情況下也可能學習到一個恆等變換的batchnorm，一般用這個即可。

　　為了activation能更有效地使用輸入信息，所以一般放在激活函數之前。

　tensorflow中關於BN（Batch Normalization）的函數主要有兩個，分別是：

tf.nn.moments
tf.nn.batch_normalization
tf.layers.batch_normalization
tf.contrib.layers.batch_norm

　　應用中一般使用 tf.layers.batch_normalization 進行歸一化操作。因為集成度較高，不需要自己計算相關的均值和方差。

1、tf.nn.moments計算的是哪一部分均值方差？

　舉例：

　　tf.nn.moments(x, axes, name=None, keep_dims=False);其中x是輸入，axes表示在哪一維計算，輸出為計算的均值和方差。

img = tf.Variable(tf.random_normal([128, 32, 32, 64]))
axis = list(range(len(img.get_shape()) - 1))
mean, variance = tf.nn.moments(img, axis)

　　一個batch里的128個圖，經過一個64 kernels卷積層處理，得到了128×64個圖，再針對每一個kernel所對應的128個圖，求它們所有像素的mean和variance，因為總共有64個kernels，輸出的結果就是一個一維長度64的數組啦！最后輸出是（64，）的數組向量。

2、 tf.layers.batch_normalization

　　在TensorFlow中，如果我們要使用batch normalization層，可以使用的API有tf.layers.batch_normalization和tf.contrib.layers.batch_norm，如果我們直接使用這兩個API構建我們的網絡，往往會出現訓練的時候網絡的表現非常好，而當測試的時候我們將其中的參數is_training設置為False時，網絡的表現非常的差。這往往是因為我們訓練的時候忽視了一個細節。

（1）方法1：
　　在tf.contrib.layers.batch_norm的幫助文檔中我們看到有以下的文字

　　Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op.
　　也就是說，我們需要在代碼運行的過程中手動對moving_mean和moving_variance進行手動更新，代碼如下：

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss)

　　這一步非常的重要，很多人在訓練的時候往往會忽略這一步，導致訓練/測試時結果相差巨大。

　　（2）還有一個方法：需要將is_training改成True。
　　要注意的地方是，在做測試的時候，如果將is_training改為 False,就會出現測試accuracy很低的現象，需要將is_training改成True。雖然這樣能得到高的accuracy,但是明顯不合理！！

3、tf.nn.batch_normalization

　　自己寫，用tf.nn.batch_normalization
　　tensorflow實現：

def batchNorm_layer(inputs, is_training, decay = 1e-5, epsilon = 1e-3):
    scale = tf.Variable(tf.ones(inputs.get_shape()[1:].as_list()))
    beta = tf.Variable(tf.zeros(inputs.get_shape()[1:].as_list()))
    pop_mean = tf.Variable(tf.zeros(inputs.get_shape()[1:].as_list()), trainable=False)
    pop_var = tf.Variable(tf.ones(inputs.get_shape()[1:].as_list()), trainable=False)

    if is_training:
        batch_mean, batch_var = tf.nn.moments(inputs, [0])
        train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
        train_var = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))
        with tf.control_dependencies([train_mean, train_var]):
            return tf.nn.batch_normalization(inputs, batch_mean, batch_var, beta, scale, epsilon)
    else:
        return tf.nn.batch_normalization(inputs, pop_mean, pop_var, beta, scale, epsilon)

參考：https://www.jianshu.com/p/0312e04e4e83

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 深度學習面試題21：批量歸一化(Batch Normalization,BN) 深度學習面試題21：批量歸一化(Batch Normalization,BN) 深度學習中的五種歸一化（BN、LN、IN、GN和SN）方法簡介深度學習中Embedding的理解深度學習淺層理解（一）深度學習中Embedding的理解如何理解深度學習中的embedding? 【深度學習】Precision 和 Recall 評價指標理解深度學習中dropout策略的理解深度學習入門必須理解這25個概念