weight decay 和正則化caffe

本文轉載自查看原文 2018-06-09 21:14 1509 視覺

正則化是為了防止過擬合,因為正則化能降低權重

caffe默認L2正則化

代碼講解的地址:http://alanse7en.github.io/caffedai-ma-jie-xi-4/

重要的一個回答:https://stats.stackexchange.com/questions/29130/difference-between-neural-net-weight-decay-and-learning-rate

按照這個答主的說法,正則化損失函數,正則化之后的損失函數如下:

這個損失函數求偏導就變成了:加號前面是原始損失函數求偏導,加號后面就變成了 *w,這樣梯度更新就變了下式:

$w_{i} \leftarrow w_{i} - η \frac{\partial E}{\partial w_{i}} - η λ w_{i} .$

L2正則化的梯度更新公式,與沒有加regulization正則化相比,每個參數更新的時候多剪了正則化的值,相當於讓每個參數多剪了weight_decay*w原本的值

根據caffe中的代碼也可以推斷出L1正則化的公式:

把替換成*w的絕對值

所以求偏導的時候就變成了,當w大於0為,當w小於0為-

void SGDSolver<Dtype>::Regularize(int param_id) {
  const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
  const vector<float>& net_params_weight_decay =
      this->net_->params_weight_decay();
  Dtype weight_decay = this->param_.weight_decay();
  string regularization_type = this->param_.regularization_type();
  Dtype local_decay = weight_decay * net_params_weight_decay[param_id];
  switch (Caffe::mode()) {
  case Caffe::CPU: {
    if (local_decay) {
      if (regularization_type == "L2") {
        // add weight decay
        caffe_axpy(net_params[param_id]->count(),
            local_decay,
            net_params[param_id]->cpu_data(),
            net_params[param_id]->mutable_cpu_diff());
      } else if (regularization_type == "L1") {
        caffe_cpu_sign(net_params[param_id]->count(),
            net_params[param_id]->cpu_data(),
            temp_[param_id]->mutable_cpu_data());
        caffe_axpy(net_params[param_id]->count(),
            local_decay,
            temp_[param_id]->cpu_data(),
            net_params[param_id]->mutable_cpu_diff());
      } else {
        LOG(FATAL) << "Unknown regularization type: " << regularization_type;
      }    
    }    
    break;
  }

caffe_axpy的實現在util下的math_functions.cpp里,實現的功能是y = a*x + y,也就是相當於把梯度更新值和weight_decay*w加起來了

caffe_sign的實現在util下的math_functions.hpp里,通過一個宏定義生成了caffe_cpu_sign這個函數,函數實現的功能是當value>0返回1,<0返回-1

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 深度學習中，使用regularization正則化(weight_decay)的好處，loss=nan 正則化詳解 caffe 中base_lr、weight_decay、lr_mult、decay_mult代表什么意思？我眼中的正則化（Regularization）正則化如何防止過擬合反問題與正則化正則化與矩陣范數 [Deep Learning] 正則化淺談范數正則化 PyTorch 中 weight decay 的設置