Batch Normalization

本文轉載自查看原文 2018-03-08 14:56 1657 tensorflow

tflearn里例子 https://github.com/tflearn/tflearn/blob/master/examples/images/convnet_mnist.py

LRN是放到pool后面，全連接層前面。

# Building convolutional network
network = input_data(shape=[None, 28, 28, 1], name='input')
network = conv_2d(network, 32, 3, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = conv_2d(network, 64, 3, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = fully_connected(network, 128, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 256, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 10, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.01,
                     loss='categorical_crossentropy', name='target')

Batch Normalization也應該如此吧？？？我看 https://github.com/tflearn/tflearn/blob/master/tflearn/layers/normalization.py LRN和BN都在一塊。http://tflearn.org/layers/normalization/ 官方文檔。

https://gist.github.com/daiwei89/a0d9600050003249e7c30f8e63742985 這是一個嘗試例子，不過遇到了一些問題 https://github.com/tflearn/tflearn/issues/530

https://github.com/tflearn/tflearn/issues/398 這里有一個提問和解答但是沒有太懂。

https://www.zhihu.com/question/53133249 知乎上有對於TensorFlow使用BN的討論，因為其需要參數mean, variance，這個得自己計算。但是也有頂層的API，見 http://ruishu.io/2016/12/27/batchnorm/：

Batch Normalization The Easy Way

Perhaps the easiest way to use batch normalization would be to simply use the tf.contrib.layers.batch_norm layer. So let’s give that a go! Let’s get some imports and data loading out of the way first.

import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from utils import show_graph
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Next, we define our typical fully-connected + batch normalization + nonlinearity set-up def dense(x, size, scope):
    return tf.contrib.layers.fully_connected(x, size, 
                                             activation_fn=None,
                                             scope=scope)

def dense_batch_relu(x, phase, scope):
    with tf.variable_scope(scope):
        h1 = tf.contrib.layers.fully_connected(x, 100, 
                                               activation_fn=None,
                                               scope='dense')
        h2 = tf.contrib.layers.batch_norm(h1, 
                                          center=True, scale=True, 
                                          is_training=phase,
                                          scope='bn')
        return tf.nn.relu(h2, 'relu')

One thing that might stand out is the phase term. We are going to use as a placeholder for a boolean which we will insert into feed_dict. It will serve as a binary indicator for whether we are in training phase=True or testing phase=False mode.

stackoverflow上提到：

Just to add to the list, there're several more ways to do batch-norm in tensorflow:

tf.nn.batch_normalization is a low-level op. The caller is responsible to handle mean and variance tensors themselves.
tf.nn.fused_batch_norm is another low-level op, similar to the previous one. The difference is that it's optimized for 4D input tensors, which is the usual case in convolutional neural networks. tf.nn.batch_normalization accepts tensors of any rank greater than 1.
tf.layers.batch_normalization is a high-level wrapper over the previous ops. The biggest difference is that it takes care of creating and managing the running mean and variance tensors, and calls a fast fused op when possible. Usually, this should be the default choice for you.

見https://stackoverflow.com/questions/48001759/what-is-right-batch-normalization-function-in-tensorflow/48006315#48006315

現在可以解答前面的疑問了。

一個例子：

#for NeuralNetwork model code is below
#We will use SGD for training to save our time. Code is from Assignment 2
#beta is the new parameter - controls level of regularization.
#Feel free to play with it - the best one I found is 0.001
#notice, we introduce L2 for both biases and weights of all layers

batch_size = 128
beta = 0.001

#building tensorflow graph
graph = tf.Graph()
with graph.as_default():
      # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  #introduce batchnorm
  tf_train_dataset_bn = tf.contrib.layers.batch_norm(tf_train_dataset) #now let's build our new hidden layer
  #that's how many hidden neurons we want
  num_hidden_neurons = 1024
  #its weights
  hidden_weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_hidden_neurons]))
  hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))

  #now the layer itself. It multiplies data by weights, adds biases
  #and takes ReLU over result
  hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset_bn, hidden_weights) + hidden_biases)

  #adding the batch normalization layerhi()
  hidden_layer_bn = tf.contrib.layers.batch_norm(hidden_layer)

  #time to go for output linear layer
  #out weights connect hidden neurons to output labels
  #biases are added to output labels  
  out_weights = tf.Variable(
    tf.truncated_normal([num_hidden_neurons, num_labels]))  

  out_biases = tf.Variable(tf.zeros([num_labels]))  

  #compute output  
  out_layer = tf.matmul(hidden_layer_bn,out_weights) + out_biases
  #our real output is a softmax of prior result
  #and we also compute its cross-entropy to get our loss
  #Notice - we introduce our L2 here
  loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    out_layer, tf_train_labels) +
    beta*tf.nn.l2_loss(hidden_weights) +
    beta*tf.nn.l2_loss(hidden_biases) +
    beta*tf.nn.l2_loss(out_weights) +
    beta*tf.nn.l2_loss(out_biases)))

  #now we just minimize this loss to actually train the network
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

  #nice, now let's calculate the predictions on each dataset for evaluating the
  #performance so far
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(out_layer)
  valid_relu = tf.nn.relu(  tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
  valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases) 

  test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)
  test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)



#now is the actual training on the ANN we built
#we will run it for some number of steps and evaluate the progress after 
#every 500 steps

#number of steps we will train our ANN
num_steps = 3001

#actual training
with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
      print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

轉自：https://www.jianshu.com/p/06216581c7ef

Batch Normalization 會使你的參數搜索問題變得很容易，使神經網絡對超參數的選擇更加穩定，超參數的范圍會更加龐大，工作效果也很好，也會使你的訓練更加容易，甚至是深層網絡。

當訓練一個模型，比如logistic回歸時，你也許會記得，歸一化輸入特征可以加快學習過程。你計算了平均值，從訓練集中減去平均值，計算了方差，接着根據方差歸一化你的數據集，在之前的視頻中我們看到，這是如何把學習問題的輪廓，從很長的東西，變成更圓的東西，更易於算法優化。所以對logistic回歸和神經網絡的歸一化輸入特征值而言這是有效的。
那么更深的模型呢？你不僅輸入了特征值x，而且這層有激活值a^[1]，這層有激活值a^[2]等等。如果你想訓練這些參數，比如w^[3]，b^[3]，那歸一化a^[2]的平均值和方差豈不是很好？以便使w^[3]，b^[3]的訓練更有效率。
在神經網絡中，已知一些中間值，假設你有一些隱藏單元值，從Z⁽¹⁾到Z^(m)，這些來源於隱藏層，所以這樣寫會更准確，即z為隱藏層，i從 1到m。

在這里，我們分別介紹和使用來自tf.layers高級封裝函數tf.layers.batch_normalization和低級的tf.nn中的tf.nn.batch_normalization

怎么加入batch normalization

我們又分為兩種情況討論：

全連接層
卷積層

使用tf.layers.batch_normalization

首先討論全連接層，分為4個步驟：

加入 is_training 參數
從全連接層中移除激活函數和bias
使用tf.layers.batch_normalization函數歸一化層的輸出
-傳遞歸一化后的值給激活函數

def fully_connected(prev_layer, num_units, is_training): """ Create a fully connectd layer with the given layer as input and the given number of neurons. :param prev_layer: Tensor The Tensor that acts as input into this layer :param num_units: int The size of the layer. That is, the number of units, nodes, or neurons. :param is_training: bool or Tensor Indicates whether or not the network is currently training, which tells the batch normalization layer whether or not it should update or use its population statistics. :returns Tensor A new fully connected layer """ layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None) layer = tf.layers.batch_normalization(layer, training=is_training) layer = tf.nn.relu(layer) return layer

然后是卷積層加入batch normalization

加入 is_training 參數
從全連接層中移除激活函數和bias
使用tf.layers.batch_normalization函數歸一化層的輸出
傳遞歸一化后的值給激活函數

比較兩者的區別，當你使用tf.layers時，對全連接層和卷積層時基本沒有區別，使用tf.nn的時候，會有一些不同。
一般來說，人們同意消除層的bias(因為批處理已經有了擴展和轉換)，並在層的非線性激活函數之前添加batch normalization。然而，對一些網絡來說，使用其他方式也能很好工作。

在train方面，需要修改：

添加is_training ,一個占位符儲存布爾量，表示網絡是否在訓練。
傳遞is_training給卷積層和全連接層
每次調用session.run(),都要給feed_dict傳遞合適的值
將train_opt放入tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):下

使用tf.nn.batch_normalization

加入 is_training 參數
去除bias 以及激活函數
添加 gamma,beta,pop_mean，pop_variance變量
使用 tf.cond處理訓練與測試的不同
tf.nn.moments計算均值和方差。with tf.control_dependencies... 更新population statistics,tf.nn.batch_normalization 歸一化層的輸出
在測試時，用tf.nn.batch_normalization歸一化層的輸出，使用訓練時候的population statistics
-加入激活函數

def fully_connected(prev_layer, num_units, is_training): """ Create a fully connectd layer with the given layer as input and the given number of neurons. :param prev_layer: Tensor The Tensor that acts as input into this layer :param num_units: int The size of the layer. That is, the number of units, nodes, or neurons. :param is_training: bool or Tensor Indicates whether or not the network is currently training, which tells the batch normalization layer whether or not it should update or use its population statistics. :returns Tensor A new fully connected layer """ layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None) gamma = tf.Variable(tf.ones([num_units])) beta = tf.Variable(tf.zeros([num_units])) pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False) pop_variance = tf.Variable(tf.ones([num_units]), trainable=False) epsilon = 1e-3 def batch_norm_training(): batch_mean, batch_variance = tf.nn.moments(layer, [0]) decay = 0.99 train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay)) train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay)) with tf.control_dependencies([train_mean, train_variance]): return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon) def batch_norm_inference(): return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon) batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference) return tf.nn.relu(batch_normalized_output)

def conv_layer(prev_layer, layer_depth, is_training): """ Create a convolutional layer with the given layer as input. :param prev_layer: Tensor The Tensor that acts as input into this layer :param layer_depth: int We'll set the strides and number of feature maps based on the layer's depth in the network. This is *not* a good way to make a CNN, but it helps us create this example with very little code. :param is_training: bool or Tensor Indicates whether or not the network is currently training, which tells the batch normalization layer whether or not it should update or use its population statistics. :returns Tensor A new convolutional layer """ strides = 2 if layer_depth % 3 == 0 else 1 in_channels = prev_layer.get_shape().as_list()[3] out_channels = layer_depth*4 weights = tf.Variable( tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05)) layer = tf.nn.conv2d(prev_layer, weights, strides=[1,strides, strides, 1], padding='SAME') gamma = tf.Variable(tf.ones([out_channels])) beta = tf.Variable(tf.zeros([out_channels])) pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False) pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False) epsilon = 1e-3 def batch_norm_training(): # Important to use the correct dimensions here to ensure the mean and variance are calculated # per feature map instead of for the entire layer batch_mean, batch_variance = tf.nn.moments(layer, [0,1,2], keep_dims=False) decay = 0.99 train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay)) train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay)) with tf.control_dependencies([train_mean, train_variance]): return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon) def batch_norm_inference(): return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon) batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference) return tf.nn.relu(batch_normalized_output)

我們不用添加with tf.control_dependencies... ，因為我們手動更新了populayions statistics 在全連接層和卷積層

def train(num_batches, batch_size, learning_rate): # Build placeholders for the input samples and labels inputs = tf.placeholder(tf.float32, [None, 28, 28, 1]) labels = tf.placeholder(tf.float32, [None, 10]) # Add placeholder to indicate whether or not we're training the model is_training = tf.placeholder(tf.bool) # Feed the inputs into a series of 20 convolutional layers layer = inputs for layer_i in range(1, 20): layer = conv_layer(layer, layer_i, is_training) # Flatten the output from the convolutional layers orig_shape = layer.get_shape().as_list() layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]]) # Add one fully connected layer layer = fully_connected(layer, 100, is_training) # Create the output layer with 1 node for each logits = tf.layers.dense(layer, 10) # Define loss and training operations model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels)) train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss) # Create operations to test accuracy correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # Train and test the network with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for batch_i in range(num_batches): batch_xs, batch_ys = mnist.train.next_batch(batch_size) # train this batch sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True}) # Periodically check the validation or training loss and accuracy if batch_i % 100 == 0: loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images, labels: mnist.validation.labels, is_training: False}) print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc)) elif batch_i % 25 == 0: loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False}) print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc)) # At the end, score the final accuracy for both the validation and test sets acc = sess.run(accuracy, {inputs: mnist.validation.images, labels: mnist.validation.labels, is_training: False}) print('Final validation accuracy: {:>3.5f}'.format(acc)) acc = sess.run(accuracy, {inputs: mnist.test.images, labels: mnist.test.labels, is_training: False}) print('Final test accuracy: {:>3.5f}'.format(acc)) # Score the first 100 test images individually, just to make sure batch normalization really worked correct = 0 for i in range(100): correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]], labels: [mnist.test.labels[i]], is_training: False}) print("Accuracy on 100 samples:", correct/100) num_batches = 800 batch_size = 64 learning_rate = 0.002 tf.reset_default_graph() with tf.Graph().as_default(): train(num_batches, batch_size, learning_rate)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Batch Normalization Batch Normalization 深度學習之Batch Normalization 在tensorflow中使用batch normalization Batch Normalization：批量歸一化論文筆記：Batch Normalization tensorflow中使用Batch Normalization batch normalization學習理解筆記 tensorflow中batch normalization的用法 Batch Normalization、Layer Normalization、Instance Normalization、Group Normalization、Switchable Normalization比較