variance_scaling_initializer( factor=2.0, mode='FAN_IN', uniform=False, seed=None, dtype=tf.float32 )
Returns an initializer that generates tensors without scaling variance.
When initializing a deep network, it is in principle advantageous to keep the scale of the input variance constant, so it does not explode or diminish by reaching the final layer. This initializer use the following formula:
if mode='FAN_IN': # Count only number of input connections. n = fan_in elif mode='FAN_OUT': # Count only number of output connections. n = fan_out elif mode='FAN_AVG': # Average number of inputs and output connections. n = (fan_in + fan_out)/2.0 truncated_normal(shape, 0.0, stddev=sqrt(factor / n))
To get Delving Deep into Rectifiers, use (Default):factor=2.0 mode='FAN_IN' uniform=False
To get Convolutional Architecture for Fast Feature Embedding, use:factor=1.0 mode='FAN_IN' uniform=True
To get Understanding the difficulty of training deep feedforward neural networks, use:factor=1.0 mode='FAN_AVG' uniform=True.
To get xavier_initializer
use either:factor=1.0 mode='FAN_AVG' uniform=True
, orfactor=1.0 mode='FAN_AVG' uniform=False
.
Args:
factor
: Float. A multiplicative factor.mode
: String. 'FAN_IN', 'FAN_OUT', 'FAN_AVG'.uniform
: Whether to use uniform or normal distributed random initialization.seed
: A Python integer. Used to create random seeds. Seetf.set_random_seed
for behavior.dtype
: The data type. Only floating point types are supported.
Returns:
An initializer that generates tensors with unit variance.
方差缩放初始化。在 TensorFlow 中,该方法写作 tf.contrib.layers.variance_scaling_initializer()。根据我们的实验,这种初始化方法比常规高斯分布初始化、截断高斯分布初始化及 Xavier 初始化的泛化/缩放性能更好。粗略地说,方差缩放初始化根据每一层输入或输出的数量(在 TensorFlow 中默认为输入的数量)来调整初始随机权重的方差,从而帮助信号在不需要其他技巧(如梯度裁剪或批归一化)的情况下在网络中更深入地传播。Xavier 和方差缩放初始化类似,只不过 Xavier 中每一层的方差几乎是相同的;但是如果网络的各层之间规模差别很大(常见于卷积神经网络),则这些网络可能并不能很好地处理每一层中相同的方差。
参考资料:https://cloud.tencent.com/info/b0f4706388d38ea3b86f257bee403f24.html
https://tensorflow.google.cn/versions/r1.2/api_docs/python/tf/contrib/layers/variance_scaling_initializer