https://blog.csdn.net/liuxiao214/article/details/81037416
http://www.dataguru.cn/article-13032-1.html
1. BatchNormalization
實現時,對axis = 0維度求均值和方差 -> 對一個batch求均值和方差
(Tensorflow代碼)
1 def Batchnorm_simple_for_train(x, gamma, beta, bn_param): 2 """ 3 param:x : 輸入數據,設shape(B,L) 4 param:gama : 縮放因子 γ 5 param:beta : 平移因子 β 6 param:bn_param : batchnorm所需要的一些參數 7 eps : 接近0的數,防止分母出現0 8 momentum : 動量參數,一般為0.9, 0.99, 0.999 9 running_mean :滑動平均的方式計算新的均值,訓練時計算,為測試數據做准備 10 running_var : 滑動平均的方式計算新的方差,訓練時計算,為測試數據做准備 11 """ 12 running_mean = bn_param['running_mean'] #shape = [B] 13 running_var = bn_param['running_var'] #shape = [B] 14 results = 0. # 建立一個新的變量 15 16 x_mean=x.mean(axis=0) # 計算x的均值 17 x_var=x.var(axis=0) # 計算方差 18 x_normalized=(x-x_mean)/np.sqrt(x_var+eps) # 歸一化 19 results = gamma * x_normalized + beta # 縮放平移 20 21 running_mean = momentum * running_mean + (1 - momentum) * x_mean 22 running_var = momentum * running_var + (1 - momentum) * x_var 23 24 #記錄新的值 25 bn_param['running_mean'] = running_mean 26 bn_param['running_var'] = running_var 27 28 return results , bn_param
2. LayerNormaliztion
實現時,對axis = 1維度求均值和方差 -> 對一個樣例的所有features的值求均值和方差
(Pytorch 代碼,來自The Annotated Transformer)
1 class LayerNorm(nn.Module): 2 "Construct a layernorm module (See citation for details)." 3 def __init__(self, features, eps=1e-6): 4 super(LayerNorm, self).__init__() 5 self.a_2 = nn.Parameter(torch.ones(features)) 6 self.b_2 = nn.Parameter(torch.zeros(features)) 7 self.eps = eps 8 9 def forward(self, x): 10 mean = x.mean(-1, keepdim=True) 11 std = x.std(-1, keepdim=True) 12 return self.a_2 * (x - mean) / (std + self.eps) + self.b_2