TensorFlow激活函數+歸一化-函數

本文轉載自查看原文 2017-08-09 16:22 13657 機器學習

激活函數的作用如下-引用《TensorFlow實踐》：

這些函數與其他層的輸出聯合使用可以生成特征圖。他們用於對某些運算的結果進行平滑或者微分。其目標是為神經網絡引入非線性。曲線能夠刻畫出輸入的復雜的變化。TensorFlow提供了多種激活函數，在CNN中一般使用tf.nn.relu的原因是因為，盡管relu會導致一些信息的損失，但是性能突出。在剛開始設計模型時，都可以采用relu的激活函數。高級用戶也可以自己創建自己的激活函數，評價激活函數是否有用的主要因素參看如下幾點：

1）該函數是單調的，隨着輸入的增加增加減小減小，從而利用梯度下降法找到局部極值點成為可能。

2）該函數是可微分的，以保證函數定義域內的任意一點上導數都存在，從而使得梯度下降法能夠正常使用來自這類激活函數的輸出。

常見的TensorFlow提供的激活函數如下：(詳細請參考http://www.tensorfly.cn/tfdoc/api_docs/python/nn.html)

1.tf.nn.relu(features, name=None)

Computes rectified linear: max(features, 0).

features: A Tensor. Must be one of the following types: float32, float64, int32, int64,uint8, int16, int8.
name: A name for the operation (optional).

注：

優點在於不受‘梯度消失’的影響，取值范圍為[0，+∞]。

缺點在於當使用了較大的學習速率時，易受到飽和的神經元的影響。

2.tf.nn.relu6(features, name=None)

Computes Rectified Linear 6: min(max(features, 0), 6).

features: A Tensor with type float, double, int32, int64, uint8, int16, or int8.
name: A name for the operation (optional).

3.tf.sigmoid(x, name=None)

Computes sigmoid of x element-wise.

Specifically, y = 1 / (1 + exp(-x)).

x: A Tensor with type float, double, int32, complex64, int64, or qint32.
name: A name for the operation (optional).

注：

優點在於sigmoid函數在樣本訓練的神經網絡中可以將輸出保持在[0.0,1.0]內部的能力非常有用。

缺點在於當輸出接近飽和或劇烈變化時，對輸出范圍的這種縮減往往會帶來一些不利影響。

4.tf.nn.softplus(features, name=None)

Computes softplus: log(exp(features) + 1).

features: A Tensor. Must be one of the following types: float32, float64, int32, int64,uint8, int16, int8.
name: A name for the operation (optional).

5.tf.tanh(x, name=None)

Computes hyperbolic tangent of x element-wise.

x: A Tensor with type float, double, int32, complex64, int64, or qint32.
name: A name for the operation (optional).

注：

優點在於雙曲正切函數和sigmoid函數比較相似，tanh擁有sigmoid的優點，用時tanh具有輸出負值的能力，tanh的值域為[-1.0,1.0].

MATLAB代碼來體現函數的類型

clear all
close all
clc
% ACTVE FUNCTION %
X = linspace(-5,5,100);
plot(X)
title('feature = X')
% tf.nn.relu(features, name=None):max(features, 0) %
Y_relu = max(X,0);
figure,plot(Y_relu)
title('tf.nn.relu(features, name=None)')
% tf.nn.relu6(features, name=None):min(max(features, 0), 6) %
Y_relu6 = min(max(X,0),6);
figure,plot(Y_relu6)
title('tf.nn.relu6(features, name=None)')
% tf.sigmoid(x, name=None):y = 1 / (1 + exp(-x))%
Y_sigmoid = 1./(1+exp(-1.*X));
figure,plot(Y_sigmoid)
title('tf.sigmoid(x, name=None)')
% tf.nn.softplus(features, name=None):log(exp(features) + 1) %
Y_softplus = log(exp(X) + 1);
figure,plot(Y_softplus)
title('tf.nn.softplus(features, name=None)')
% tf.tanh(x, name=None):tanh(features) %
Y_tanh = tanh(X);
figure,plot(Y_tanh)
title('tf.tanh(x, name=None)')

X=feature tf.nn.relu(features, name=None)

tf.nn.relu6(features, name=None) tf.sigmoid(x, name=None)

tf.nn.softplus(features, name=None) tf.tanh(x, name=None)

歸一化函數的重要作用-引用《TensorFlow實踐》：

歸一化層並非CNN所獨有。在使用tf.nn.relu時，考慮輸出的歸一化是有價值的（詳細參看http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf）。由於relu是無界函數，利用某些形式的歸一化來識別哪些高頻特征通常是十分有用的。local response normalization最早是由Krizhevsky和Hinton在關於ImageNet的論文里面使用的一種數據標准化方法，即使現在，也依然會有不少CNN網絡會使用到這種正則手段。

`tf.nn.local_response_normalization(input, depth_radius=None, bias=None, alpha=None, beta=None, name=None)`

Local Response Normalization.

The 4-D input tensor is treated as a 3-D array of 1-D vectors (along the last dimension), and each vector is normalized independently. Within a given vector, each component is divided by the weighted, squared sum of inputs within depth_radius. In detail,

sqr_sum[a, b, c, d] =
    sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)
output = input / (bias + alpha * sqr_sum ** beta)

第一個參數input：這個輸入就是feature map了，既然是feature map，那么它就具有[batch, height, width, channels]這樣的shape
第二個參數depth_radius：這個值需要自己指定，就是上述公式中的n/2
第三個參數bias：上述公式中的k
第四個參數alpha：上述公式中的α
第五個參數beta：上述公式中的β
第六個參數name：上述操作的名稱
返回值是新的feature map，它應該具有和原feature map相同的shape

以上是這種歸一手段的公式，其中a的上標指該層的第幾個feature map，a的下標x，y表示feature map的像素位置，N指feature map的總數量，公式里的其它參數都是超參，需要自己指定的。

這種方法是受到神經科學的啟發，激活的神經元會抑制其鄰近神經元的活動（側抑制現象），至於為什么使用這種正則手段，以及它為什么有效，查閱了很多文獻似乎也沒有詳細的解釋，可

能是由於后來提出的batch normalization手段太過火熱，漸漸的就把local response normalization掩蓋了吧。

import tensorflow as tf  
  
a = tf.constant([  
    [[1.0, 2.0, 3.0, 4.0],  
     [5.0, 6.0, 7.0, 8.0],  
     [8.0, 7.0, 6.0, 5.0],  
     [4.0, 3.0, 2.0, 1.0]],  
    [[4.0, 3.0, 2.0, 1.0],  
     [8.0, 7.0, 6.0, 5.0],  
     [1.0, 2.0, 3.0, 4.0],  
     [5.0, 6.0, 7.0, 8.0]]  
])  
#reshape a,get the feature map [batch:1 height:2 width:2 channels:8]  
a = tf.reshape(a, [1, 2, 2, 8])  
  
normal_a=tf.nn.local_response_normalization(a,2,0,1,1)  
with tf.Session() as sess:  
    print("feature map:")  
    image = sess.run(a)  
    print (image)  
    print("normalized feature map:")  
    normal = sess.run(normal_a)  
    print (normal)

運行結果：

解釋：

這里我取了n/2=2，k=0，α=1，β=1。公式中的N就是輸入張量的通道總數：由a = tf.reshape(a, [1, 2, 2, 8]) 得到 N=8，變量i代表的是不同的通道，從0開始到7.

舉個例子，比如對於一通道的第一個像素“1”來說，我們把參數代人公式就是1/(1^2+2^2+3^2)=0.07142857，對於四通道的第一個像素“4”來說，公式就是4/（2^2+3^2+4^2+5^2+6^2）=0.04444445，以此類推。轉載：http://blog.csdn.net/mao_xiao_feng/article/details/53488271

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tensorflow Relu激活函數 TensorFlow使用記錄 (五）：激活函數和初始化方式【tensorflow2.0】激活函數activation 激活函數匯總（附TensorFlow實現） TensorFlow2.0（7）：激活函數 Tensorflow ActiveFunction激活函數解析為什么需要激活函數為什么需要歸一化 pytorch BatchNorm2d python內置函數：enumerate用法總結學習筆記TF014:卷積層、激活函數、池化層、歸一化層、高級層 tensorflow中常用激活函數和損失函數為什么要引入激活函數？