1:
神經網絡中,我們通過最小化神經網絡來訓練網絡,所以在訓練時最后一層是損失函數層(LOSS),
在測試時我們通過准確率來評價該網絡的優劣,因此最后一層是准確率層(ACCURACY)。
但是當我們真正要使用訓練好的數據時,我們需要的是網絡給我們輸入結果,對於分類問題,我們需要獲得分類結果,如下右圖最后一層我們得到
的是概率,我們不需要訓練及測試階段的LOSS,ACCURACY層了。
下圖是能過$CAFFE_ROOT/python/draw_net.py繪制$CAFFE_ROOT/models/caffe_reference_caffnet/train_val.prototxt , $CAFFE_ROOT/models/caffe_reference_caffnet/deploy.prototxt,分別代表訓練時與最后使用時的網絡結構。
我們一般將train與test放在同一個.prototxt中,需要在data層輸入數據的source,
而在使用時.prototxt只需要定義輸入圖片的大小通道數據參數即可,如下圖所示,分別是
$CAFFE_ROOT/models/caffe_reference_caffnet/train_val.prototxt , $CAFFE_ROOT/models/caffe_reference_caffnet/deploy.prototxt的data層
訓練時, solver.prototxt中使用的是rain_val.prototxt
./build/tools/caffe/train -solver ./models/bvlc_reference_caffenet/solver.prototxt
使用上面訓練的網絡提取特征,使用的網絡模型是deploy.prototxt
./build/tools/extract_features.bin models/bvlc_refrence_caffenet.caffemodel models/bvlc_refrence_caffenet/deploy.prototxt
。。
2:
(1)介紹 *_train_test.prototxt文件與 *_deploy.prototxt文件的不http://blog.csdn.net/sunshine_in_moon/article/details/49472901
(2)生成deploy文件的Python代碼:http://www.cnblogs.com/denny402/p/5685818.html
*_train_test.prototxt文件:這是訓練與測試網絡配置文件
在博文http://www.cnblogs.com/denny402/p/5685818.html 中給出了生成 deploy.prototxt文件的Python源代碼,但是每個網絡不同,修改起來比較麻煩,下面給出該博文中以mnist為例生成deploy文件的源代碼,可根據自己網絡的設置做出相應修改:(下方代碼未測試)
# -*- coding: utf-8 -*- from caffe import layers as L,params as P,to_proto root='/home/xxx/' deploy=root+'mnist/deploy.prototxt' #文件保存路徑 def create_deploy(): #少了第一層,data層 conv1=L.Convolution(bottom='data', kernel_size=5, stride=1,num_output=20, pad=0,weight_filler=dict(type='xavier')) pool1=L.Pooling(conv1, pool=P.Pooling.MAX, kernel_size=2, stride=2) conv2=L.Convolution(pool1, kernel_size=5, stride=1,num_output=50, pad=0,weight_filler=dict(type='xavier')) pool2=L.Pooling(conv2, pool=P.Pooling.MAX, kernel_size=2, stride=2) fc3=L.InnerProduct(pool2, num_output=500,weight_filler=dict(type='xavier')) relu3=L.ReLU(fc3, in_place=True) fc4 = L.InnerProduct(relu3, num_output=10,weight_filler=dict(type='xavier')) #最后沒有accuracy層,但有一個Softmax層 prob=L.Softmax(fc4) return to_proto(prob) def write_deploy(): with open(deploy, 'w') as f: f.write('name:"Lenet"\n') f.write('input:"data"\n') f.write('input_dim:1\n') f.write('input_dim:3\n') f.write('input_dim:28\n') f.write('input_dim:28\n') f.write(str(create_deploy())) if __name__ == '__main__': write_deploy()
用代碼生成deploy文件還是比較麻煩。我們在構建深度學習網絡時,肯定會先定義好訓練與測試網絡的配置文件——*_train_test.prototxt文件,我們可以通過修改*_train_test.prototxt文件 來生成 deploy 文件。以cifar10為例先簡單介紹一下兩者的區別。
(1)deploy 文件中的數據層更為簡單,即將*_train_test.prototxt文件中的輸入訓練數據lmdb與輸入測試數據lmdb這兩層刪除,取而代之的是,
- layer {
- name: "data"
- type: "Input"
- top: "data"
- input_param { shape: { dim: 1 dim: 3 dim: 32 dim: 32 } }
- }
shape { dim: 1 #num,可自行定義 dim: 3 #通道數,表示RGB三個通道 dim: 32 #圖像的長和寬,通過 *_train_test.prototxt文件中數據輸入層的crop_size獲取 dim: 32
(2)卷積層和全連接層中weight_filler{}與bias_filler{}兩個參數不用再填寫,因為這兩個參數的值,由已經訓練好的模型*.caffemodel文件提供。如下所示代碼,將*_train_test.prototxt文件中的weight_filler、bias_filler全部刪除。
layer { # weight_filler、bias_filler刪除
name: "ip2"
type: "InnerProduct"
bottom: "ip1" top: "ip2"
param {
lr_mult: 1 #權重w的學習率倍數
}
param { lr_mult: 2 #偏置b的學習率倍數
}
inner_product_param { num_output: 10
weight_filler { type: "gaussian" std: 0.1 }
bias_filler { type: "constant" }
}
}
刪除后變為
- layer {
- name: "ip2"
- type: "InnerProduct"
- bottom: "ip1"
- top: "ip2"
- param {
- lr_mult: 1
- }
- param {
- lr_mult: 2
- }
- inner_product_param {
- num_output: 10
- }
- }
- layer { #刪除該層
- name: "accuracy"
- type: "Accuracy"
- bottom: "ip2"
- bottom: "label"
- top: "accuracy"
- include {
- phase: TEST
- }
- }
2) 輸出層
*_train_test.prototxt文件
- layer{
- name: "loss" #注意此處層名稱與下面的不同
- type: "SoftmaxWithLoss" #注意此處與下面的不同
- bottom: "ip2"
- bottom: "label" #注意標簽項在下面沒有了,因為下面的預測屬於哪個標簽,因此不能提供標簽
- top: "loss"
- }
- layer {
- name: "prob"
- type: "Softmax"
- bottom: "ip2"
- top: "prob"
- }
注意在兩個文件中輸出層的類型都發生了變化一個是SoftmaxWithLoss,另一個是Softmax。另外為了方便區分訓練與應用輸出,訓練是輸出時是loss,應用時是prob。
下面給出CIFAR10中的配置文件cifar10_quick_train_test.prototxt與其模型構造文件 cifar10_quick.prototxt 直觀展示兩者的區別。
cifar10_quick_train_test.prototxt文件代碼
cifar10_quick_train_test.prototxt文件代碼 name: "CIFAR10_quick" layer { #該層去掉 name: "cifar" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mean_file: "examples/cifar10/mean.binaryproto" } data_param { source: "examples/cifar10/cifar10_train_lmdb" batch_size: 100 backend: LMDB } } layer { #該層去掉 name: "cifar" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { mean_file: "examples/cifar10/mean.binaryproto" } data_param { source: "examples/cifar10/cifar10_test_lmdb" batch_size: 100 backend: LMDB } } layer { #將下方的weight_filler、bias_filler全部刪除 name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.0001 } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "relu1" type: "ReLU" bottom: "pool1" top: "pool1" } layer { #weight_filler、bias_filler刪除 name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: AVE kernel_size: 3 stride: 2 } } layer { #weight_filler、bias_filler刪除 name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: AVE kernel_size: 3 stride: 2 } } layer { #weight_filler、bias_filler刪除 name: "ip1" type: "InnerProduct" bottom: "pool3" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 64 weight_filler { type: "gaussian" std: 0.1 } bias_filler { type: "constant" } } } layer { # weight_filler、bias_filler刪除 name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "gaussian" std: 0.1 } bias_filler { type: "constant" } } } layer { #將該層刪除 name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { #修改 name: "loss" #---loss 修改為 prob type: "SoftmaxWithLoss" # SoftmaxWithLoss 修改為 softmax bottom: "ip2" bottom: "label" #去掉 top: "loss" } 以下為cifar10_quick.prototxt layer { #將兩個輸入層修改為該層 name: "data" type: "Input" top: "data" input_param { shape: { dim: 1 dim: 3 dim: 32 dim: 32 } } #注意shape中變量值的修改,CIFAR10中的 *_train_test.protxt文件中沒有 crop_size } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 #權重W的學習率倍數 } param { lr_mult: 2 #偏置b的學習率倍數 } convolution_param { num_output: 32 pad: 2 #加邊為2 kernel_size: 5 stride: 1 } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX #Max Pooling kernel_size: 3 stride: 2 } } layer { name: "relu1" type: "ReLU" bottom: "pool1" top: "pool1" } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 pad: 2 kernel_size: 5 stride: 1 } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: AVE #均值池化 kernel_size: 3 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 stride: 1 } } layer { name: "relu3" type: "ReLU" #使用ReLU激勵函數,這里需要注意的是,本層的bottom和top都是conv3> bottom: "conv3" top: "conv3" } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: AVE kernel_size: 3 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool3" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 64 } } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 } } layer { name: "prob" type: "Softmax" bottom: "ip2" top: "prob" }
3:
將train_val.prototxt 轉換成deploy.prototxt
1.刪除輸入數據(如:type:data...inckude{phase: TRAIN}),然后添加一個數據維度描述。
- input: "data"
- input_dim: 1
- input_dim: 3
- input_dim: 224
- input_dim: 224
- force_backward: true
2.移除最后的“loss” 和“accuracy” 層,加入“prob”層。
[plain]
- layers {
- name: "prob"
- type: SOFTMAX
- bottom: "fc8"
- top: "prob"
- }
如果train_val文件中還有其他的預處理層,就稍微復雜點。如下,在'data'層,在‘data’層和‘conv1’層(with bottom:”data” / top:”conv1″). 插入一個層來計算輸入數據的均值。
- layer {
- name: “mean”
- type: “Convolution”
- <strong>bottom: “data”
- top: “data”</strong>
- param {
- lr_mult: 0
- decay_mult: 0
- }
- …}
在deploy.prototxt文件中,“mean” 層必須保留,只是容器改變,相應的‘conv1’也要改變 ( bottom:”mean”/ top:”conv1″ )。
[plain]
- layer {
- name: “mean”
- type: “Convolution”
- <strong>bottom: “data”
- top: “mean“</strong>
- param {
- lr_mult: 0
- decay_mult: 0
- }
- …}
4:
生成deploy文件
如果要把訓練好的模型拿來測試新的圖片,那必須得要一個deploy.prototxt文件,這個文件實際上和test.prototxt文件差不多,只是頭尾不相同而也。deploy文件沒有第一層數據輸入層,也沒有最后的Accuracy層,但最后多了一個Softmax概率層。
這里我們采用代碼的方式來自動生成該文件,以mnist為例。
deploy.py
# -*- coding: utf-8 -*- from caffe import layers as L,params as P,to_proto root=‘/home/xxx/‘ deploy=root+‘mnist/deploy.prototxt‘ #文件保存路徑 def create_deploy(): #少了第一層,data層 conv1=L.Convolution(bottom=‘data‘, kernel_size=5, stride=1,num_output=20, pad=0,weight_filler=dict(type=‘xavier‘)) pool1=L.Pooling(conv1, pool=P.Pooling.MAX, kernel_size=2, stride=2) conv2=L.Convolution(pool1, kernel_size=5, stride=1,num_output=50, pad=0,weight_filler=dict(type=‘xavier‘)) pool2=L.Pooling(conv2, pool=P.Pooling.MAX, kernel_size=2, stride=2) fc3=L.InnerProduct(pool2, num_output=500,weight_filler=dict(type=‘xavier‘)) relu3=L.ReLU(fc3, in_place=True) fc4 = L.InnerProduct(relu3, num_output=10,weight_filler=dict(type=‘xavier‘)) #最后沒有accuracy層,但有一個Softmax層 prob=L.Softmax(fc4) return to_proto(prob) def write_deploy(): with open(deploy, ‘w‘) as f: f.write(‘name:"Lenet"\n‘) f.write(‘input:"data"\n‘) f.write(‘input_dim:1\n‘) f.write(‘input_dim:3\n‘) f.write(‘input_dim:28\n‘) f.write(‘input_dim:28\n‘) f.write(str(create_deploy())) if __name__ == ‘__main__‘: write_deploy()
運行該文件后,會在mnist目錄下,生成一個deploy.prototxt文件。
這個文件不推薦用代碼來生成,反而麻煩。大家熟悉以后可以將test.prototxt復制一份,修改相應的地方就可以了,更加方便。
Convert train_val.prototxt to deploy.prototxt
- Remove input datalayer and insert a description of input data dimension
- Remove “loss” and “accuracy” layer and insert “prob” layer at the end
If you have preprocessing layers, things get a bit more tricky.
For example, in train_val.prototxt, which includes the “data” layer, I insert a layer to calculate the mean over the channels of input data,
layer { name: “mean” type: “Convolution” bottom: “data” top: “data” param { lr_mult: 0 decay_mult: 0 }
…}
between “data” layer and “conv1” layer (with bottom:”data” / top:”conv1″).
In deploy.prototxt, the “mean” layer has to be retained, yet its container needs to be changed! i.e.
layer { name: “mean” type: “Convolution” bottom: “data” top: “mean“ param { lr_mult: 0 decay_mult: 0 }
…}
and the “conv1” layer needs to be changed accordingly, ( bottom:”mean”/ top:”conv1″ ).
It’s fine to use train_val.prototxt with “mean” layer using “data” container in the training phase, and use deploy.prototxt with “mean” layer using “mean” container in the testing phase in python. The learned caffemodel can be loaded correctly.