variance_scaling_initializer( factor=2.0, mode='FAN_IN', uniform=False, seed=None, dtype=tf.float32 )
Returns an initializer that generates tensors without scaling variance.
When initializing a deep network, it is in principle advantageous to keep the scale of the input variance constant, so it does not explode or diminish by reaching the final layer. This initializer use the following formula:
if mode='FAN_IN': # Count only number of input connections. n = fan_in elif mode='FAN_OUT': # Count only number of output connections. n = fan_out elif mode='FAN_AVG': # Average number of inputs and output connections. n = (fan_in + fan_out)/2.0 truncated_normal(shape, 0.0, stddev=sqrt(factor / n))
To get Delving Deep into Rectifiers, use (Default):factor=2.0 mode='FAN_IN' uniform=False
To get Convolutional Architecture for Fast Feature Embedding, use:factor=1.0 mode='FAN_IN' uniform=True
To get Understanding the difficulty of training deep feedforward neural networks, use:factor=1.0 mode='FAN_AVG' uniform=True.
To get xavier_initializer
use either:factor=1.0 mode='FAN_AVG' uniform=True
, orfactor=1.0 mode='FAN_AVG' uniform=False
.
Args:
factor
: Float. A multiplicative factor.mode
: String. 'FAN_IN', 'FAN_OUT', 'FAN_AVG'.uniform
: Whether to use uniform or normal distributed random initialization.seed
: A Python integer. Used to create random seeds. Seetf.set_random_seed
for behavior.dtype
: The data type. Only floating point types are supported.
Returns:
An initializer that generates tensors with unit variance.
方差縮放初始化。在 TensorFlow 中,該方法寫作 tf.contrib.layers.variance_scaling_initializer()。根據我們的實驗,這種初始化方法比常規高斯分布初始化、截斷高斯分布初始化及 Xavier 初始化的泛化/縮放性能更好。粗略地說,方差縮放初始化根據每一層輸入或輸出的數量(在 TensorFlow 中默認為輸入的數量)來調整初始隨機權重的方差,從而幫助信號在不需要其他技巧(如梯度裁剪或批歸一化)的情況下在網絡中更深入地傳播。Xavier 和方差縮放初始化類似,只不過 Xavier 中每一層的方差幾乎是相同的;但是如果網絡的各層之間規模差別很大(常見於卷積神經網絡),則這些網絡可能並不能很好地處理每一層中相同的方差。
參考資料:https://cloud.tencent.com/info/b0f4706388d38ea3b86f257bee403f24.html
https://tensorflow.google.cn/versions/r1.2/api_docs/python/tf/contrib/layers/variance_scaling_initializer