caffe使用預訓練的模型進行finetune

本文轉載自查看原文 2016-12-29 16:47 6792 caffe/ dl學習

首先明確預訓練好的模型和自己的網絡結構是有差異的，預訓練模型的參數如何跟自己的網絡匹配的呢：

參考官網教程：http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

--If we provide the weights argument to the caffe train command, the pretrained weights will be loaded into our model, matching layers by name.

意思就是預訓練的模型根據你當前網絡的layer 名進行匹配參數，加入預訓練原始網絡的第一個卷積層name是conv1，而你自己的第一個卷積層網絡name是Convolution1，那么這個層在預網絡中的參數就不會被匹配調用，這就沒有實現我們finetune的目的！

因為沒有匹配上的layer會這樣處理：Since there is no layer named that in the bvlc_reference_caffenet, that layer will begin training with random weights.也就是隨機初始化

原來網絡結構中的全連接層fc8, 需要改一下名字，如我的改成"re-fc8". 因為我們做的是微調。微調的意思就是先在別的數據集上進行訓練，把訓練好的權值，作為我們現在數據集的權值初始化，就不再需要隨機初始化了。現在的數據和訓練時的數據不一致，因此有些層數的設置就會有點區別。比如這個例子中，用來訓練模型的數據集是imagenet，分為1000類，而我們的數據集就只有5類，因此在fc8這層上的num_output就會有區別，因此在這一層上就不能用人家的權值了，就需要把這層的名字改得和原來的網絡結構不一樣。

因此我們在finetune的時候一般同時使用模型和模型對應的訓練網絡結構，保證所有參數被正確加載和調用

常見的fintune基礎思路：We will also decrease the overall learning rate base_lr in the solver prototxt, but boost the lr_multon the newly introduced layer. The idea is to have the rest of the model change very slowly with new data, but let the new layer learn fast. Additionally, we set stepsize in the solver to a lower value than if we were training from scratch, since we’re virtually far along in training and therefore want the learning rate to go down faster. Note that we could also entirely prevent fine-tuning of all layers other than fc8_flickr by setting their lr_mult to 0.

常用pre-trained模型下載地址：https://github.com/BVLC/caffe/wiki/Model-Zoo

更多可參考：http://www.cnblogs.com/denny402/p/5137534.html

一些實際finetune的建議：http://blog.csdn.net/nongfu_spring/article/details/51514040

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 預訓練模型時代：告別finetune, 擁抱adapter tensorflow利用預訓練模型進行目標檢測（二）：預訓練模型的使用 [caffe(二)]Python加載訓練caffe模型並進行測試1 DeepFaceLab 模型預訓練參數Pretrain的使用！ C#中的深度學習（五）：在ML.NET中使用預訓練模型進行硬幣識別我的Keras使用總結（3）——利用bottleneck features進行微調預訓練模型VGG16 Caffe使用step by step：使用自己數據對已經訓練好的模型進行finetuning 預訓練模型（三）-----Bert 預訓練模型之Roberta 利用NLP預訓練模型進行輿情分類