Caffe源碼解析6：Neuron_Layer

本文轉載自查看原文 2016-02-19 14:16 4700

轉載請注明出處，樓燚(yì)航的blog，http://home.cnblogs.com/louyihang-loves-baiyan/

NeuronLayer，顧名思義這里就是神經元，激活函數的相應層。我們知道在blob進入激活函數之前和之后他的size是不會變的，而且激活值也就是輸出 \(y\) 只依賴於相應的輸入 \(x\)。在Caffe里面所有的layer的實現都放在src文件夾下的layer文件夾中，基本上很多文章里應用到的layer類型它都有cpu和cuda的實現。
在caffe里面NeuronLayer比較多，在此羅列了一下

AbsValLayer
BNLLLayer
DropoutLayer
ExpLayer
LogLayer
PowerLayer
ReLULayer
CuDNNReLULayer
SigmoidLayer
CuDNNSigmoidLayer
TanHLayer
CuDNNTanHLayer
ThresholdLayer
PReLULayer

Caffe里面的Neuron種類比較多方便人們使用，這里我們着重關注幾個主要的Neuro_layer

ReLULayer

目前在激活層的函數中使用ReLU是非常普遍的，一般我們在看資料或者講義中總是提到的是Sigmoid函數，它比Sigmoid有更快的收斂性，因為sigmoid在收斂的時候越靠近目標點收斂的速度會越慢，也是其函數的曲線形狀決定的。而ReLULayer則相對收斂更快，具體可以看Krizhevsky 12年的那篇ImageNet CNN文章有更詳細的介紹。
其計算的公式是：

\[y = \max(0, x) \]

如果有負斜率式子變為：

\[y = \max(0, x) + \nu \min(0, x) \]

反向傳播的公式

\[ \frac{\partial E}{\partial x} = \left\{ \begin{array}{lr} \nu \frac{\partial E}{\partial y} & \mathrm{if} \; x \le 0 \\ \frac{\partial E}{\partial y} & \mathrm{if} \; x > 0 \end{array} \right. \]

其在cafffe中的forward和backward函數為

template <typename Dtype>
void ReLULayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  const int count = bottom[0]->count();
  Dtype negative_slope = this->layer_param_.relu_param().negative_slope();
  for (int i = 0; i < count; ++i) {
    top_data[i] = std::max(bottom_data[i], Dtype(0))
        + negative_slope * std::min(bottom_data[i], Dtype(0));
  }
}

template <typename Dtype>
void ReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[0]) {
    const Dtype* bottom_data = bottom[0]->cpu_data();
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    const int count = bottom[0]->count();
    Dtype negative_slope = this->layer_param_.relu_param().negative_slope();
    for (int i = 0; i < count; ++i) {
      bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)
          + negative_slope * (bottom_data[i] <= 0));
    }
  }
}

SigmoidLayer

Sigmoid函數，也稱為階躍函數，函數曲線是一個優美的S形。目前使用Sigmoid函數已經不多了，大多使用ReLU來代替，其對應的激活函數為：

\[y = (1 + \exp(-x))^{-1} \]

其反向傳播時

\[\frac{\partial E}{\partial x} = \frac{\partial E}{\partial y} y (1 - y)\]

其相應的forward和backward的函數為

template <typename Dtype>
void SigmoidLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  const int count = bottom[0]->count();
  for (int i = 0; i < count; ++i) {
    top_data[i] = sigmoid(bottom_data[i]);
  }
}

template <typename Dtype>
void SigmoidLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[0]) {
    const Dtype* top_data = top[0]->cpu_data();
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    const int count = bottom[0]->count();
    for (int i = 0; i < count; ++i) {
      const Dtype sigmoid_x = top_data[i];
      bottom_diff[i] = top_diff[i] * sigmoid_x * (1. - sigmoid_x);
    }
  }
}

DropoutLayer

DropoutLayer現在是非常常用的一種網絡層，只用在訓練階段，一般用在網絡的全連接層中，可以減少網絡的過擬合問題。其思想是在訓練過程中隨機的將一部分輸入x之置為0。

\[y_{\mbox{train}} = \left\{ \begin{array}{ll} \frac{x}{1 - p} & \mbox{if } u > p \\ 0 & \mbox{otherwise} \end{array} \right. \]

其forward_cpu和backward_cpu為:

template <typename Dtype>
void DropoutLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  unsigned int* mask = rand_vec_.mutable_cpu_data();
  const int count = bottom[0]->count();
  if (this->phase_ == TRAIN) {
    // Create random numbers構造隨機數，這里是通過向量掩碼來和bottom的數據相乘，scale_是控制undropped的比例
    caffe_rng_bernoulli(count, 1. - threshold_, mask);
    for (int i = 0; i < count; ++i) {
      top_data[i] = bottom_data[i] * mask[i] * scale_;
    }
  } else {
    caffe_copy(bottom[0]->count(), bottom_data, top_data);
  }
}

template <typename Dtype>
void DropoutLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[0]) {
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    if (this->phase_ == TRAIN) {
      const unsigned int* mask = rand_vec_.cpu_data();
      const int count = bottom[0]->count();
      for (int i = 0; i < count; ++i) {
        bottom_diff[i] = top_diff[i] * mask[i] * scale_;
      }
    } else {
      caffe_copy(top[0]->count(), top_diff, bottom_diff);
    }
  }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Caffe源碼解析3：Layer Caffe源碼解析7：Pooling_Layer Caffe源碼解析4： Data_layer Caffe源碼解析5：Conv_Layer Caffe源碼解析1：Blob caffe源碼分析--softmax_layer.cpp Caffe源碼解析2：SycedMem Caffe2源碼解析之core Caffe源碼理解3：Layer基類與template method設計模式 smooth_L1_loss_layer.cu解讀 caffe源碼初認識