caffe中activation function的形式,直接決定了其訓練速度以及SGD的求解。
在caffe中,不同的activation function對應的sgd的方式是不同的,因此,在配置文件中指定activation layer的type,目前caffe中用的最多的是relu的activation function.
caffe中,目前實現的activation function有以下幾種:
absval, bnll, power, relu, sigmoid, tanh等幾種,分別有單獨的layer層。其數學公式分別為:
算了,這部分我不解釋了,直接看caffe的tutorial吧
ReLU / Rectified-Linear and Leaky-ReLU
- LayerType:
RELU - CPU implementation:
./src/caffe/layers/relu_layer.cpp - CUDA GPU implementation:
./src/caffe/layers/relu_layer.cu - Parameters (
ReLUParameter relu_param)- Optional
negative_slope[default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.
- Optional
-
Sample (as seen in
./examples/imagenet/imagenet_train_val.prototxt)layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" }
Given an input value x, The RELU layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption.
Sigmoid
- LayerType:
SIGMOID - CPU implementation:
./src/caffe/layers/sigmoid_layer.cpp - CUDA GPU implementation:
./src/caffe/layers/sigmoid_layer.cu -
Sample (as seen in
./examples/imagenet/mnist_autoencoder.prototxt)layers { name: "encode1neuron" bottom: "encode1" top: "encode1neuron" type: SIGMOID }
The SIGMOID layer computes the output as sigmoid(x) for each input element x.
TanH / Hyperbolic Tangent
- LayerType:
TANH - CPU implementation:
./src/caffe/layers/tanh_layer.cpp - CUDA GPU implementation:
./src/caffe/layers/tanh_layer.cu -
Sample
layers { name: "layer" bottom: "in" top: "out" type: TANH }
The TANH layer computes the output as tanh(x) for each input element x.
Absolute Value
- LayerType:
ABSVAL - CPU implementation:
./src/caffe/layers/absval_layer.cpp - CUDA GPU implementation:
./src/caffe/layers/absval_layer.cu -
Sample
layers { name: "layer" bottom: "in" top: "out" type: ABSVAL }
The ABSVAL layer computes the output as abs(x) for each input element x.
Power
- LayerType:
POWER - CPU implementation:
./src/caffe/layers/power_layer.cpp - CUDA GPU implementation:
./src/caffe/layers/power_layer.cu - Parameters (
PowerParameter power_param)- Optional
power[default 1]scale[default 1]shift[default 0]
- Optional
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: POWER power_param { power: 1 scale: 1 shift: 0 } }
The POWER layer computes the output as (shift + scale * x) ^ power for each input element x.
BNLL
- LayerType:
BNLL - CPU implementation:
./src/caffe/layers/bnll_layer.cpp - CUDA GPU implementation:
./src/caffe/layers/bnll_layer.cu -
Sample
layers { name: "layer" bottom: "in" top: "out" type: BNLL }
The BNLL (binomial normal log likelihood) layer computes the output as log(1 + exp(x)) for each input element x.
