編譯分布式並行版caffe(Open MPI)教程


caffe版本:https://github.com/yjxiong/caffe

使用環境:

1 CentOS release 6.6 (Final)
2 CUDA8.0
3 CuDNN6.0
4 Open MPI 3.1.3
5 OpenCV 3.1.0
View Code

CUDA8.0、CuDNN6.0、OpenCV3.1.0以及其他caffe所需要的依賴已經裝好,這里僅需要安裝OpenMPI3.1.3,步驟如下:

OpenMPI-3.1.3安裝

1. 解壓openmpi-3.1.3,進入解壓后的文件夾 — openmpi3.1.3,在終端輸入如下命令:

1 ./configure --prefix=/storage/student5/usr/local/openmpi --with-cuda --enable-mpi-thread-multiple
2 #--prefix后的路徑未openmpi的安裝路徑;
3 sudo make all install
4 # make all install 加sudo,否則安裝過程中可能出問題
View Code

2. 測試安裝是否成功

1 cd openmpi-3.1.3/examples
2 make
3 mpirun -np 4 hello_c
View Code

Caffe安裝

1. 下載caffe,將Makefile.config.example另存為Makefile.config,將其修改成以下的樣子:

 1 ## Refer to http://caffe.berkeleyvision.org/installation.html
 2 # Contributions simplifying and improving our build system are welcome!
 3 
 4 # cuDNN acceleration switch (uncomment to build with cuDNN).
 5  USE_CUDNN := 1
 6 
 7 # CPU-only switch (uncomment to build without GPU support).
 8 # CPU_ONLY := 1
 9 
10 # uncomment to disable IO dependencies and corresponding data layers
11  USE_OPENCV := 1
12  USE_LEVELDB := 1
13  USE_LMDB := 1
14 
15 # Uncomment if you're using OpenCV 3
16  OPENCV_VERSION := 3
17 
18 # To customize your choice of compiler, uncomment and set the following.
19 # N.B. the default for Linux is g++ and the default for OSX is clang++
20 # CUSTOM_CXX := g++
21 
22 # CUDA directory contains bin/ and lib/ directories that we need.
23 CUDA_DIR := /usr/local/cuda
24 # On Ubuntu 14.04, if cuda tools are installed via
25 # "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
26 # CUDA_DIR := /usr
27 
28 # CUDA architecture setting: going with all of them.
29 # For CUDA < 6.0, comment the *_50 lines for compatibility.
30 CUDA_ARCH :=     -gencode arch=compute_30,code=sm_30 \
31         -gencode arch=compute_35,code=sm_35 \
32         -gencode arch=compute_50,code=sm_50 \
33         -gencode arch=compute_50,code=compute_50
34 
35 # BLAS choice:
36 # atlas for ATLAS (default)
37 # mkl for MKL
38 # open for OpenBlas
39 BLAS := atlas
40 # Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
41 # Leave commented to accept the defaults for your choice of BLAS
42 # (which should work)!
43  BLAS_INCLUDE := /usr/include
44  BLAS_LIB := /usr/lib64/atlas
45 
46 # Homebrew puts openblas in a directory that is not on the standard search path
47 # BLAS_INCLUDE := $(shell brew --prefix openblas)/include
48 # BLAS_LIB := $(shell brew --prefix openblas)/lib
49 
50 # This is required only if you will compile the matlab interface.
51 # MATLAB directory should contain the mex binary in /bin.
52  MATLAB_DIR := /usr/local/MATLAB/R2014a
53 # MATLAB_DIR := /Applications/MATLAB_R2012b.app
54 
55 # NOTE: this is required only if you will compile the python interface.
56 # We need to be able to find Python.h and numpy/arrayobject.h.
57 PYTHON_INCLUDE := /usr/include/python2.7 \
58         /usr/lib/python2.7/dist-packages/numpy/core/include
59 # Anaconda Python distribution is quite popular. Include path:
60 # Verify anaconda location, sometimes it's in root.
61 # ANACONDA_HOME := $(HOME)/anaconda
62 # PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
63         # $(ANACONDA_HOME)/include/python2.7 \
64         # $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \
65 
66 # We need to be able to find libpythonX.X.so or .dylib.
67 PYTHON_LIB := /usr/lib
68 # PYTHON_LIB := $(ANACONDA_HOME)/lib
69 
70 # Homebrew installs numpy in a non standard path (keg only)
71 # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
72 # PYTHON_LIB += $(shell brew --prefix numpy)/lib
73 
74 # Uncomment to support layers written in Python (will link against Python libs)
75  WITH_PYTHON_LAYER := 1
76 
77 # Whatever else you find you need goes here.
78 INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
79 LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
80 
81 # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
82 # INCLUDE_DIRS += $(shell brew --prefix)/include
83 # LIBRARY_DIRS += $(shell brew --prefix)/lib
84 
85 # Uncomment to use `pkg-config` to specify OpenCV library paths.
86 # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
87 # USE_PKG_CONFIG := 1
88 
89 BUILD_DIR := build
90 DISTRIBUTE_DIR := distribute
91 
92 # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
93 # DEBUG := 1
94 
95 # The ID of the GPU that 'make runtest' will use to run unit tests.
96 TEST_GPUID := 0
97 
98 # enable pretty build (comment to see full commands)
99 Q ?= @
View Code

2. 在caffe目錄下執行以下操作:

1 mkdir build && cd build
View Code

3. 編譯caffe

  如果要開啟matlab接口,先修改caffe根目錄下的CMakeList.txt文件line24:

1 caffe_option(BUILD_matlab "Build Matlab wrapper" OFF IF UNIX OR APPLE)
View Code

  修改為:

1 caffe_option(BUILD_matlab "Build Matlab wrapper" ON IF UNIX OR APPLE)
View Code

  否則在caffe/build路徑下直接進行以下操作:

1 cmake -DUSE_MPI=ON -DMPI_CXX_COMPILER=/path/to/your/openmpi/bin/mpicxx ..
2 # USE_MPI=ON即表示開啟Open MPI
3 # -DMPI_CXX_COMPILER后的路徑一定得是Open MPI的安裝路徑下的bin中的mpicxx路徑,在/usr/bin下也有這個mpicxx,不要錯寫路徑了
View Code

4. 安裝caffe,在caffe根目錄下執行以下操作:

1 make all -j8
2 make install
3 # 我在安裝過程中,make all之后就不需要再make install
4 make runtest
5 # 同參考教程中一樣,有兩個test未通過
View Code

5. 編譯Python接口:

  a. 添加環境變量:

1 gedit ~/.bashrc
View Code

  b. 在其中寫入:

1 export PYTHONPATH=$PYTHONPATH:/path/to/your/caffe/python
View Code

  c. 使環境變量生效:

1 source ~/.bashrc
View Code

  d. 在caffe根目錄下:

1 make pycaffe
2 # 教程中有加sudo,但是我沒有加sudo也沒有影響
View Code

  e. 測試Python接口,在終端輸入以下命令:

1 python
2 import caffe
3 # 如果無錯,則python接口編譯成功
View Code

 

出現問題:

1. 安裝caffe過程中,編譯caffe時,輸入以下命令出錯:

1 cmake -DUSE_MPI=ON -DMPI_CXX_COMPILER=/path/to/your/openmpi/bin/mpicxx ..
View Code

  問題1:

 1 CMake Warning at /usr/local/opencv-3.1.0/cmake/OpenCVConfig.cmake:166 (message):
 2   Found OpenCV Windows Pack but it has no binaries compatible with your
 3   configuration.
 4 
 5   You should manually point CMake variable OpenCV_DIR to your build of OpenCV
 6   library.
 7 Call Stack (most recent call first):
 8   cmake/Dependencies.cmake:62 (find_package)
 9   CMakeLists.txt:31 (include)
10 
11 
12 CMake Error at cmake/Dependencies.cmake:62 (find_package):
13   Found package configuration file:
14 
15     /usr/local/opencv-3.1.0/cmake/OpenCVConfig.cmake
16 
17   but it set OpenCV_FOUND to FALSE so package "OpenCV" is considered to be
18   NOT FOUND.
19 Call Stack (most recent call first):
20   CMakeLists.txt:31 (include)
21 
22 
23 -- Configuring incomplete, errors occurred!
24 See also "/storage/student5/usr/local/caffe/build/CMakeFiles/CMakeOutput.log".
25 See also "/storage/student5/usr/local/caffe/build/CMakeFiles/CMakeError.log".
View Code

  解決方法:

    嘗試一:在CMakeList.txt文件中加入set(OpenCV_DIR /path/to/your/OpenCV/build),該法無效;

    嘗試二:退回到caffe根目錄,然后make clean,暫時加入如下環境變量后重新從mkdir build && cd build開始,該法有效。

1 export OpenCV_DIR=/path/to/your/opencv/build
View Code

  問題2:

1 CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
2   Could NOT find Atlas (missing: Atlas_LAPACK_LIBRARY)
3 Call Stack (most recent call first):
4   /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE)
5   cmake/Modules/FindAtlas.cmake:43 (find_package_handle_standard_args)
6   cmake/Dependencies.cmake:74 (find_package)
7   CMakeLists.txt:31 (include)
View Code

  解決方法:

    嘗試一:指定Atlas路徑,退回到caffe根目錄,然后make clean,暫時加入環境變量export Atlas_ROOT_DIR=/your/Atlas/Root,再重新從mkdir build && cd build開始,該法無效;

    嘗試二:退回到caffe根目錄,然后make clean,重新mkdir build && cd build開始,在終端輸入以下命令后繼續進行,該法有效。

1 cmake -DBLAS=open .
View Code

 

2. 當make all -j8時,

  問題1:

1 /usr/bin/ld: .build_release/examples/cpp_classification/classification.o: undefined reference to symbol '_ZN2cv6imreadERKNS_6StringEi'
2 /usr/local/lib/libopencv_imgcodecs.so.3.1: error adding symbols: DSO missing from command line
3 collect2: error: ld returned 1 exit status
4 make: *** [.build_release/examples/cpp_classification/classification.bin] Error 1
5 make: *** Waiting for unfinished jobs....
View Code

  解決方法:由於使用的是opencv-3.x,需要鏈接libopencv_imgcodercs.so,在Makefile文件中,line172處做如下修改:

1 LIBRARIES += glog gflags protobuf leveldb snappy \
2     lmdb boost_system hdf5_hl hdf5 m \
3     opencv_core opencv_highgui opencv_imgproc
View Code

  改為:

1 LIBRARIES += glog gflags protobuf leveldb snappy \
2     lmdb boost_system hdf5_hl hdf5 m \
3     opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
View Code

  問題2:

1 nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be 
2 removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
View Code

  解決方法:刪除Makefile.config中的以下語句:

1 -gencode arch=compute_20,code=sm_20 \
2 -gencode arch=compute_20,code=sm_21 \
View Code

 

參考教程:

1. https://blog.csdn.net/whyerdiku/article/details/78842498 (Python+Matlab接口)

2. http://www.cnblogs.com/beihaidao/p/6866342.html (Python+Matlab接口)

3. https://blog.csdn.net/qq_21368481/article/details/81257265?tdsourcetag=s_pctim_aiomsg (Matlab接口)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM