caffe中權值初始化方法

本文轉載自查看原文 2016-12-12 19:39 7658 caffe學習

首先說明：在caffe/include/caffe中的 filer.hpp文件中有它的源文件，如果想看，可以看看哦，反正我是不想看，代碼細節吧，現在不想知道太多，有個宏觀的idea就可以啦，如果想看代碼的具體的話，可以看：http://blog.csdn.net/xizero00/article/details/50921692，寫的還是很不錯的（不過有的地方的備注不對，不知道改過來了沒）。

文件 filler.hpp提供了7種權值初始化的方法，分別為：常量初始化（constant）、高斯分布初始化（gaussian）、positive_unitball初始化、均勻分布初始化（uniform）、xavier初始化、msra初始化、雙線性初始化（bilinear）。

275 Filler<Dtype>* GetFiller(const FillerParameter& param) {
276   const std::string& type = param.type();
277   if (type == "constant") {
278     return new ConstantFiller<Dtype>(param);
279   } else if (type == "gaussian") {
280     return new GaussianFiller<Dtype>(param);
281   } else if (type == "positive_unitball") {
282     return new PositiveUnitballFiller<Dtype>(param);
283   } else if (type == "uniform") {
284     return new UniformFiller<Dtype>(param);
285   } else if (type == "xavier") {
286     return new XavierFiller<Dtype>(param);
287   } else if (type == "msra") {
288     return new MSRAFiller<Dtype>(param);
289   } else if (type == "bilinear") {
290     return new BilinearFiller<Dtype>(param);
291   } else {
292     CHECK(false) << "Unknown filler name: " << param.type();
293   }
294   return (Filler<Dtype>*)(NULL);
295 }

並且結合 .prototxt 文件中的 FillerParameter來看看怎么用：

43 message FillerParameter {
  44   // The filler type.
  45   optional string type = 1 [default = 'constant'];
  46   optional float value = 2 [default = 0]; // the value in constant filler
  47   optional float min = 3 [default = 0]; // the min value in uniform filler
  48   optional float max = 4 [default = 1]; // the max value in uniform filler
  49   optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
  50   optional float std = 6 [default = 1]; // the std value in Gaussian filler
  51   // The expected number of non-zero output weights for a given input in
  52   // Gaussian filler -- the default -1 means don't perform sparsification.
  53   optional int32 sparse = 7 [default = -1];
  54   // Normalize the filler variance by fan_in, fan_out, or their average.
  55   // Applies to 'xavier' and 'msra' fillers.
  56   enum VarianceNorm {
  57     FAN_IN = 0;
  58     FAN_OUT = 1;                                                                                                                                                                          
  59     AVERAGE = 2;
  60   }
  61   optional VarianceNorm variance_norm = 8 [default = FAN_IN];
  62 }

constant初始化方法：

它就是把權值或着偏置初始化為一個常數，具體是什么常數，自己可以定義啦。它的值等於上面的.prototxt文件中的 value 的值，默認為0

下面是是與之相關的.proto文件里的定義，在定義網絡時，可能分用到這些參數。

45   optional string type = 1 [default = 'constant'];
46   optional float value = 2 [default = 0]; // the value in constant filler

uniform初始化方法

它的作用就是把權值與偏置進行均勻分布的初始化。用min 與 max 來控制它們的的上下限，默認為（0，1）.

下面是是與之相關的.proto文件里的定義，在定義網絡時，可能分用到這些參數。

45   optional string type = 1 [default = 'constant'];
  47   optional float min = 3 [default = 0]; // the min value in uniform filler
  48   optional float max = 4 [default = 1]; // the max value in uniform filler

Gaussian 初始化

給定高斯函數的均值與標准差,然后呢？生成高斯分布就可以了。

不過要說明一點的就是， gaussina初始化可以進行 sparse，意思就是可以把一些權值設為0. 控制它的用參數 sparse. sparse表示相對於 num_output來說非0的個數，在代碼實現中，會把 sparse/num_output 作為 bernoulli分布的概率，明白？？生成的bernoulli分布的數字（為0或1）與原來的權值相乖，就可以實現一部分權值為0了。即然這樣，我有一點不明白，為什么不直接把sparsr定義成概率呢？？這樣多么簡單啦，並且好明白啊。。對於 num_output是什么，你在定義你的網絡的.prototxt里，一定分有的啦，不信你去看看；

下面是是與之相關的.proto文件里的定義，在定義網絡時，可能分用到這些參數。

45   optional string type = 1 [default = 'constant'];
  49   optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
  50   optional float std = 6 [default = 1]; // the std value in Gaussian filler
  51   // The expected number of non-zero output weights for a given input in
  52   // Gaussian filler -- the default -1 means don't perform sparsification.
  53   optional int32 sparse = 7 [default = -1];

positive_unitball 初始化

通俗一點，它干了點什么呢？即讓每一個單元的輸入的權值的和為 1. 例如吧，一個神經元有100個輸入，這樣的話，讓這100個輸入的權值的和為1. 源碼中怎么實現的呢？首先給這100個權值賦值為在（0，1）之間的均勻分布，然后，每一個權值再除以它們的和就可以啦。

感覺這么做，可以有助於防止權值初始化過大，使激活函數（sigmoid函數）進入飽和區。所以呢，它應該比適合simgmoid形的激活函數。

它不需要參數去控制。

XavierFiller初始化：

對於這個初始化的方法，是有理論的。它來自這篇論文《Understanding the difficulty of training deep feedforward neural networks》。在推導過程中，我們認為處於 tanh激活函數的線性區，所以呢，對於ReLU激活函數來說，XavierFiller初始化也是很適合啦。

如果不想看論文的話，可以看看 https://zhuanlan.zhihu.com/p/22028079，我覺得寫的很棒，另外，http://blog.csdn.net/shuzfan/article/details/51338178可以作為補充。

它的思想就是讓一個神經元的輸入權重的（當反向傳播時，就變為輸出了）的方差等於：1 / 輸入的個數；這樣做的目的就是可以讓信息可以在網絡中均勻的分布一下。

對於權值的分布：是一個讓均值為0，方差為1 / 輸入的個數的均勻分布。

如果我們更注重前向傳播的話，我們可以選擇 fan_in，即正向傳播的輸入個數；如果更注重后向傳播的話，我們選擇 fan_out, 因為吧，等着反向傳播的時候，fan_out就是神經元的輸入個數；如果兩者都考慮的話，那就選 average = (fan_in + fan_out) /2

下面是是與之相關的.proto文件里的定義，在定義網絡時，可能分用到這些參數。

45   optional string type = 1 [default = 'constant'];
  54   // Normalize the filler variance by fan_in, fan_out, or their average.
  55   // Applies to 'xavier' and 'msra' fillers.
  56   enum VarianceNorm {
  57     FAN_IN = 0;
  58     FAN_OUT = 1;                                                                                                                                                                          
  59     AVERAGE = 2;
  60   }
  61   optional VarianceNorm variance_norm = 8 [default = FAN_IN];

MSRAFiller初始化方式

它與上面基本類似，它是基於《Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification》來推導的，並且呢，它是基於激活函數為 ReLU函數哦，

對於權值的分布，是基於均值為0，方差為 2 /輸入的個數的高斯分布，這也是和上面的Xavier Filler不同的地方；它特別適合激活函數為 ReLU函數的啦。

下面是是與之相關的.proto文件里的定義，在定義網絡時，可能分用到這些參數。

45   optional string type = 1 [default = 'constant'];
  54   // Normalize the filler variance by fan_in, fan_out, or their average.
  55   // Applies to 'xavier' and 'msra' fillers.
  56   enum VarianceNorm {
  57     FAN_IN = 0;
  58     FAN_OUT = 1;                                                                                                                                                                          
  59     AVERAGE = 2;
  60   }
  61   optional VarianceNorm variance_norm = 8 [default = FAN_IN];

BilinearFiller初始化

對於它，要還沒有怎么用到過，它常用在反卷積神經網絡里的權值初始化；

直接上源碼，大家看看吧；

213 /*!
214 @brief Fills a Blob with coefficients for bilinear interpolation.
215 
216 A common use case is with the DeconvolutionLayer acting as upsampling.
217 You can upsample a feature map with shape of (B, C, H, W) by any integer factor
218 using the following proto.
219 \code
220 layer {
221   name: "upsample", type: "Deconvolution"
222   bottom: "{{bottom_name}}" top: "{{top_name}}"
223   convolution_param {
224     kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
225     num_output: {{C}} group: {{C}}
226     pad: {{ceil((factor - 1) / 2.)}}
227     weight_filler: { type: "bilinear" } bias_term: false
228   }
229   param { lr_mult: 0 decay_mult: 0 }
230 }
231 \endcode
232 Please use this by replacing `{{}}` with your values. By specifying
233 `num_output: {{C}} group: {{C}}`, it behaves as
234 channel-wise convolution. The filter shape of this deconvolution layer will be
235 (C, 1, K, K) where K is `kernel_size`, and this filler will set a (K, K)
236 interpolation kernel for every channel of the filter identically. The resulting
237 shape of the top feature map will be (B, C, factor * H, factor * W).
238 Note that the learning rate and the
239 weight decay are set to 0 in order to keep coefficient values of bilinear
240 interpolation unchanged during training. If you apply this to an image, this
241 operation is equivalent to the following call in Python with Scikit.Image.
242 \code{.py}
243 out = skimage.transform.rescale(img, factor, mode='constant', cval=0)
244 \endcode
245  */
246 template <typename Dtype>
247 class BilinearFiller : public Filler<Dtype> {
248  public:
249   explicit BilinearFiller(const FillerParameter& param)
250       : Filler<Dtype>(param) {}
251   virtual void Fill(Blob<Dtype>* blob) {
252     CHECK_EQ(blob->num_axes(), 4) << "Blob must be 4 dim.";
253     CHECK_EQ(blob->width(), blob->height()) << "Filter must be square";
254     Dtype* data = blob->mutable_cpu_data();
255     int f = ceil(blob->width() / 2.);
256     float c = (2 * f - 1 - f % 2) / (2. * f);
257     for (int i = 0; i < blob->count(); ++i) {
258       float x = i % blob->width();
259       float y = (i / blob->width()) % blob->height();
260       data[i] = (1 - fabs(x / f - c)) * (1 - fabs(y / f - c));
261     }
262     CHECK_EQ(this->filler_param_.sparse(), -1)
263          << "Sparsity not supported by this Filler.";
264   }
265 };

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 caffe中權值初始化方法神經網絡中權值初始化的方法神經網絡中的權值初始化方法 pytorch中的權值初始化神經網絡權值初始化方法-Xavier 權值初始化 - Xavier和MSRA方法神經網絡中的權值初始化 PyTorch 學習筆記（四）：權值初始化的十種方法激活函數與權值初始化【學習筆記】Pytorch深度學習-權值初始化