記intel杯比賽中各種bug與debug【其二】：intel caffe的使用和大坑

本文轉載自查看原文 2018-02-03 17:12 1065 各種項目/ Bug&Debug/ 機器學習

放棄使用pytorch，學習caffe
本文僅記錄個人觀點，不免存在許多錯誤

Caffe 學習

caffe模型生成需要如下步驟

編寫network.prototxt
編寫solver.prototxt
caffe train -solver=solver.prototxt

network.prototxt編寫

在caffe中，Net由Layer構成，其中數據由Blob進行傳遞
network編寫就是組織layer
關於layer如何編寫，參考caffe.proto
這里寫出layer一般形式

layer{
    name: "layer name"
    type: "layer type"
    bottom: "bottom blob"
    top: "top blob"
    param{
        ...
    }
    include{ phase: ... }
    exclude{ phase: ... }
    # 對某一type的layer參數, 這里以內積層為例
    inner_product_param{
        num_output: 64
        weight_filler{
            type: "xavier"
        }
        bias_filler{
            type: "constant"
            value: 0
        }
        axis=1
    }
}

在這里簡要說一下我們的目的，
通過中文分詞rnn和貝葉斯分類器實現一個垃圾信息處理的功能
這里直接附上我的network好了，反正沒人看: (

# project for chinese segmentation
#	T: 64,	batch: 64
#	label[T*batch, 1, 1, 1]	cont[T*batch, 1, 1, 1]=0 or 1
#	data[T*batch, 1, 1, 1] ->
#	embed[T*batch, 2000, 1, 1](drop&reshape) -> [T, batch, 2000, 1]
#	lstm[T, batch, 256, 1](drop) ->
#	ip[T, batch, 64, 1](relu) ->
#	ip[T, batch, 5, 1] ->
#	Accuracy & SoftMaxWithLoss
# for output: 0-none, 1-Signal, 2-Begin, 3-Middle, 4-End


name: "Segment"

# train data
layer{
	name: "train_data"
	type: "HDF5Data"
	top: "data"
	top: "label"
	top: "cont"
	include{ phase: TRAIN }
	hdf5_data_param{
		source: "/home/tanglizi/caffe/projects/data_segment/h5_test.txt"
		batch_size: 4096
		shuffle: true
	}
}
# test data
layer{
	name: "test_data"
	type: "HDF5Data"
	top: "data"
	top: "label"
	top: "cont"
	include{ phase: TEST }
	hdf5_data_param{
		source: "/home/tanglizi/caffe/projects/data_segment/h5_test.txt"
		batch_size: 4096
		shuffle: true
	}
}

# embed
layer{
	name: "embedding"
	type: "Embed"
	bottom: "data"
	top: "embedding"
	param{
		lr_mult: 1
	}
	embed_param{
		input_dim: 14000
		num_output: 2000
		weight_filler {
			type: "uniform"
			min: -0.08
			max: 0.08
		}
	}
}
# embed-drop
layer{
	name: "embed-drop"
	type: "Dropout"
	bottom: "embedding"
	top: "embed-drop"
	dropout_param{
		dropout_ratio: 0.05
	}
}


# reshape
# embed
# [T*batch, 2000, 1, 1] ->
# [T, batch, 2000, 1]
layer{
	name: "embed-reshape"
	type: "Reshape"
	bottom: "embed-drop"
	top: "embed-reshaped"
	reshape_param{
		shape{
			dim: 64
			dim: 64
			dim: 2000
		}
	}
}

# label
layer{
	name: "label-reshape"
	type: "Reshape"
	bottom: "label"
	top: "label-reshaped"
	reshape_param{
		shape{
			dim: 64
			dim: 64
			dim: 1
		}
	}
}

# cont
layer{
	name: "cont-reshape"
	type: "Reshape"
	bottom: "cont"
	top: "cont-reshaped"
	reshape_param{
		shape{
			dim: 64
			dim: 64
		}
	}
}


# lstm
layer{
	name: "lstm"
	type: "LSTM"
	bottom: "embed-reshaped"
	bottom: "cont-reshaped"
	top: "lstm"
	recurrent_param{
		num_output: 256
		weight_filler{
			# type: "xavier"
			type: "uniform"
			min: -0.08
			max: 0.08
		}
		bias_filler{
			type: "constant"
			value: 0
		}
	}
}

# lstm-drop
layer{
	name: "lstm1-drop"
	type: "Dropout"
	bottom: "lstm"
	top: "lstm-drop"
	dropout_param{
		dropout_ratio: 0.05
	}
}

# connect
# ip1
layer{
	name: "ip1"
	type: "InnerProduct"
	bottom: "lstm-drop"
	top: "ip1"
	param{
		lr_mult: 1
		decay_mult: 1
	}
	param{
		lr_mult: 2
		decay_mult: 0
	}
	inner_product_param{
		num_output: 64
		weight_filler{
			type: "xavier"
		}
		bias_filler{
			type: "constant"
			value: 0
		}
		axis: 2
	}
}
# relu
layer{
	name: "relu1"
	type: "ReLU"
	bottom: "ip1"
	top: "relu1"
	relu_param{
		negative_slope: 0
	}
}

# ip2
layer{
	name: "ip2"
	type: "InnerProduct"
	bottom: "relu1"
	top: "ip2"
	param{
		lr_mult: 1
	}
	param{
		lr_mult: 2
	}
	inner_product_param{
		num_output: 5
		weight_filler{
			type: "xavier"
		}
		bias_filler{
			type: "constant"
			value: 0
		}
		axis: 2
	}
}


# loss
layer{
	name: "loss"
	type: "SoftmaxWithLoss"
	bottom: "ip2"
	bottom: "label-reshaped"
	top: "loss"
	softmax_param{
		axis: 2
	}
}

# accuracy
layer{
	name: "accuracy"
	type: "Accuracy"
	bottom: "ip2"
	bottom: "label-reshaped"
	top: "accuracy"
	accuracy_param{
		axis: 2
	}
}

solver.prototxt編寫

solver用於調整caffe訓練等操作的超參數
solver如何編寫，參考caffe.proto
附上一般寫法

net: "network.proto"

test_iter: 100
test_interval: 500

type: "Adam"
base_lr: 0.01
weight_decay: 0.0005

lr_policy: "inv"

display: 100
max_iter: 10000

snapshot: 5000
snapshot_prefix: "/home/tanglizi/caffe/projects/segment/"

solver_mode: CPU

訓練模型

caffe train -solver=solver.prototxt

這時可能報錯:
Message type "caffe.MultiPhaseSolverParameter" has no field named "net".
請注意不是沒有net，而是其他參數設置有誤
intel caffe特有的報錯

Caffemodel 的使用

模型訓練的結果很有問題，accuracy非常低，感覺又是network寫錯了
於是想看看其中發生了什么
caffemodel可以通過c++或python matlab接口來使用
接下來進入intel caffe 和intel devcloud大坑

pycaffe的使用

注意：以下python代碼在devcloud進行
首先我們知道caffe模型就是訓練好的一個神經網絡
於是必然需要caffe.Net()來讀取caffemodel和net.prototxt，需要caffe.io讀取數據

import caffe
from caffe import io
# 這時報錯：
#Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#ImportError: cannot import name 'io'

連忙查看caffe里面有什么

dir(caffe)
# 顯示 ['__doc__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
# 正常顯示 ['AdaDeltaSolver', 'AdaGradSolver', 'AdamSolver', 'Classifier', 'Detector', 'Layer', 'NesterovSolver',
#  'Net', 'NetSpec', 'RMSPropSolver', 'SGDSolver', 'TEST', 'TRAIN', '__builtins__', '__doc__', '__file__', '__name__',
#  '__package__', '__path__', '__version__', '_caffe', 'classifier', 'detector', 'get_solver', 'init_log', 'io', 'layer_type_list',
#  'layers', 'log', 'net_spec', 'params', 'proto', 'pycaffe', 'set_device', 'set_mode_cpu', 'set_mode_gpu', 'set_random_seed', 'to_proto']

淦，根本什么都沒有
由於我們的項目需要必須在服務器上進行，所以不考慮在本地機器上運行
現在有兩條路：重新編譯一個caffe 或用c++實現
懶得搞事情，選擇c++實現

c++中使用caffemodel

注：以下過程使用intel caffe
首先我們知道caffe模型就是訓練好的一個神經網絡
於是必然需要caffe.Net()來讀取caffemodel和net.prototxt

// predict.cpp
#include <caffe/caffe.hpp>
boost::shared_ptr< Net<float> > net(new caffe::Net<float>(net, Caffe::TEST));

開始手動編譯

    # 注意到caffe.hpp的位置，我們添加路徑即可
    clang++ -I <caffe path>/include -lboost_system predict.cpp -o predict
    #不料報錯
    #/tmp/predict-fea879.o: In function 'main':
    #predict.cpp:(.text+0x35b): undefined reference to 'caffe::Net<int>::Net(std::__cxx11::basic_string<char, std::char_traits<char>, 
    #std::allocator<char> > const&, caffe::Phase, int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
    #std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const*,
    # caffe::Net<int> const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
    #clang: error: linker command failed with exit code 1 (use -v to see invocation)
  
    # 看起來找不到libcaffe，添加路徑即可
    clang++ -I <caffe path>/include -lboost_system predict.cpp -o predict -L <caffe path>/build/lib -lcaffe
    # 不料報錯 錯誤相同

放棄手動編譯，放在examples/下重新編譯caffe
不料報錯錯誤相同
放在tools/下(caffe.cpp的位置)重新編譯caffe
直接跳過跳過編譯predict.cpp
煩放棄本地使用c++
在devcloud上手動編譯
不料報錯錯誤相同
雲上都編譯不了我還干chua
重新編譯intel caffe
按照環境重新配置Makefile.config
編譯報錯

In file included from .build_release/src/caffe/proto/caffe.pb.cc:5:0:  
.build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc which is  
 #error This file was generated by a newer version of protoc which is  
.build_release/src/caffe/proto/caffe.pb.h:13:2: error: #error incompatible with your Protocol Buffer headers. Please update  
 #error incompatible with your Protocol Buffer headers.  Please update  
.build_release/src/caffe/proto/caffe.pb.h:14:2: error: #error your headers.  
 #error your headers.  
.build_release/src/caffe/proto/caffe.pb.h:22:35: fatal error: google/protobuf/arena.h: No such file or directory
 #include <google/protobuf/arena.h>

查了一下，此處需要libprotoc 2.6.1，然而devcloud上libprotoc 3.2.0
煩死了
於是查到這個文章，在此十分感謝 @大黃老鼠同學！！！
好了現在完全放棄caffe了！
轉戰chainer！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Intel Caffe 與原生Caffe Intel AI Cloud 使用 intel vtune 介紹、安裝和使用 intel英特爾NUC主機bug大清除案例 I7-5775C之所以被Intel跳過，是因為本身有太多BUG spdk intel intel DCAP Intel CPUs Can通訊中Intel與Motorola區別 Linux 版的 Intel MKL 的安裝使用