（轉）深度學習主機環境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0

接上文《深度學習主機攢機小記》，這台GTX1080主機准備好之后，就是配置深度學習環境了，這里選擇了比較熟悉Ubuntu系統，不過是最新的16.04版本，另外在Nvidia GTX1080的基礎上安裝相關GPU驅動，外加CUDA8.0，因為都比較新，所以踩了很多坑。

1. 安裝Ubuntu16.04

不考慮雙系統，直接安裝 Ubuntu16.04，從ubuntu官方下載64位版本: ubuntu-16.04-desktop-amd64.iso 。

在MAC下制作了 Ubuntu USB 安裝盤，具體方法可參考: 在MAC下使用ISO制作Linux的安裝USB盤，之后通過Bios引導U盤啟動安裝Ubuntu系統：

1）一開始安裝就踩了一個坑，選擇”Install Ubuntu”回車后過一會兒屏幕顯示“輸入不支持”，google了好多方案，最終和ubuntu對顯卡的支持有關，需要手動添加顯卡選項: nomodeset，使其支持Nvidia系列顯卡，參考：安裝ubuntu黑屏問題的解決 or How do I set ‘nomodeset’ after I’ve already installed Ubuntu?

2) 磁盤分區，全部干掉之前主機自帶的Window 10系統，分區為 /boot, /, /home 等幾個目錄，同時把第二塊4T硬盤也掛載了上去，作為數據盤。

3）安裝完畢后Ubuntu 16.04的分辨率很低，在顯卡驅動未安裝之前，可以手動修改一下grub文件：

sudo vim /etc/default/grub

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo’
#GRUB_GFXMODE=640×480
# 這里分辨率自行設置
GRUB_GFXMODE=1024×768

sudo update-grub

4）安裝SSH Server，這樣可以遠程ssh訪問這台GTX1080主機：

sudo apt-get install openssh-server

5）更新Ubuntu16.04源，用的是中科大的源：

cd /etc/apt/
sudo cp sources.list sources.list.bak
sudo vi sources.list

把下面的這些源添加到source.list文件頭部：

deb http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse

最后更新源和更新已安裝的包：

sudo apt-get update
sudo apt-get upgrade

2. 安裝GTX1080驅動

安裝 Nvidia 驅動 367.27

sudo add-apt-repository ppa:graphics-drivers/ppa

第一次運行出現如下的警告：

Fresh drivers from upstream, currently shipping Nvidia.

## Current Status

We currently recommend: `nvidia-361`, Nvidia’s current long lived branch.
For GeForce 8 and 9 series GPUs use `nvidia-340`
For GeForce 6 and 7 series GPUs use `nvidia-304`

## What we’re working on right now:

– Normal driver updates
– Investigating how to bring this goodness to distro on a cadence.

## WARNINGS:

This PPA is currently in testing, you should be experienced with packaging before you dive in here. Give us a few days to sort out the kinks.

Volunteers welcome! See also: https://github.com/mamarley/nvidia-graphics-drivers/

http://www.ubuntu.com/download/desktop/contribute
更多信息： https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
按回車繼續或者 Ctrl+c 取消添加

回車后繼續:

sudo apt-get update
sudo apt-get install nvidia-367
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev

之后重啟系統讓GTX1080顯卡驅動生效。

3. 下載和安裝CUDA

在安裝CUDA之前，google了一下，發現在Ubuntu16.04下安裝CUDA7.5問題多多，幸好CUDA8已出，支持GTX1080：

New in CUDA 8

Pascal Architecture Support
Out of box performance improvements on Tesla P100, supports GeForce GTX 1080
Simplify programming using Unified memory on Pascal including support for large datasets, concurrent data access and atomics*
Optimize Unified Memory performance using new data migration APIs*
Faster Deep Learning using optimized cuBLAS routines for native FP16 computation
Developer Tools
Quickly identify latent system-level bottlenecks using the new critical path analysis feature
Improve productivity with up to 2x faster NVCC compilation speed
Tune OpenACC applications and overall host code using new profiling extensions
Libraries
Accelerate graph analytics algorithms with nvGRAPH
New cuBLAS matrix multiply optimizations for matrices with sizes smaller than 512 and for batched operation

不過下載CUDA需要注冊和登陸NVIDIA開發者賬號，CUDA8下載頁面提供了很詳細的系統選擇和安裝說明，

這里選擇了Ubuntu16.04系統runfile安裝方案，千萬不要選擇deb方案，前方無數坑：

屏幕快照 2016-07-15 上午8.25.37

下載的“cuda_8.0.27_linux.run”有1.4G，按照Nivdia官方給出的方法安裝CUDA8：

sudo sh cuda_8.0.27_linux.run --tmpdir=/opt/temp/

這里加了–tmpdir主要是直接運行“sudo sh cuda_8.0.27_linux.run”會提示空間不足的錯誤，其實是全新的電腦主機，硬盤足夠大的，google了以下發現加個tmpdir就可以了：

Not enough space on parition mounted at /.
Need 5091561472 bytes.

Disk space check has failed. Installation cannot continue.

執行后會有一系列提示讓你確認，非常非常非常非常關鍵的地方是是否安裝361這個低版本的驅動：

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 361.62?

答案必須是n，否則之前安裝的GTX1080驅動就白費了，而且問題多多。

Logging to /opt/temp//cuda_install_6583.log
Using more to view the EULA.
End User License Agreement
————————–

Preface
——-

The following contains specific license terms and conditions
for four separate NVIDIA products. By accepting this
agreement, you agree to comply with all the terms and
conditions applicable to the specific product(s) included
herein.

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 361.62?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /home/textminer ]:

Installing the CUDA Toolkit in /usr/local/cuda-8.0 …
Installing the CUDA Samples in /home/textminer …
Copying samples to /home/textminer/NVIDIA_CUDA-8.0_Samples now…
Finished copying samples.

===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /home/textminer

Please make sure that
– PATH includes /usr/local/cuda-8.0/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run -silent -driver

Logfile is /opt/temp//cuda_install_6583.log

安裝完畢后，再聲明一下環境變量，並將其寫入到 ~/.bashrc 的尾部:

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

最后再來測試一下CUDA，運行：

nvidia-smi

結果如下所示：

屏幕快照 2016-07-15 上午8.44.43

再來試幾個CUDA例子：

cd 1_Utilities/deviceQuery
make

這里如果提示gcc版本過高，可以安裝低版本的gcc並做軟連接替換，具體方法請自行google，我用低版本的gcc4.9替換了ubuntu16.04自帶的gcc5.x版本。

“/usr/local/cuda-8.0″/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery.o -c deviceQuery.cpp
“/usr/local/cuda-8.0″/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release

執行 ./deviceQuery ，得到:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GTX 1080”
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8112 MBytes (8506179584 bytes)
(20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1835 MHz (1.84 GHz)
Memory Clock rate: 5005 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
Result = PASS

再測試試一下nobody：

cd ../../5_Simulations/nbody/
make

執行：

./nbody -benchmark -numbodies=256000 -device=0

得到：

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: “GeForce GTX 1080
> Compute 6.1 CUDA device: [GeForce GTX 1080]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 2291.469 ms
= 286.000 billion interactions per second
= 5719.998 single-precision GFLOP/s at 20 flops per interaction

參考:
Nvidia GTX 1080 on Ubuntu 16.04 for Deep Learning
Ubuntu 16.04下安裝Tensorflow(GPU)
ubuntu16.04安裝 cuda7.5
Ubuntu16.04無法安裝CUDA嗎？
Ubuntu16.04+matlab2014a+anaconda2+OpenCV3.1+caffe安裝
ubuntu 16.04 編譯opencv3.1，opencv多版本切換
TensorFlow, Caffe, Chainer と Deep Learning大御所を一気に source code build で GPU向けに setupしてみた
feature request: support for cuda 8.0 rc
GTX 1080 CUDA performance on Linux (Ubuntu 16.04) preliminary results (nbody and NAMD)
Anyone able to run Tensorflow with 1070/1080 on Ubuntu 16.04/15.10/15.04?
Tensorflow on Ubuntu 16.04 with Nvidia GTX 1080

注：原創文章，轉載請注明出處及保留鏈接“我愛自然語言處理”：http://www.52nlp.cn

本文鏈接地址：深度學習主機環境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0 http://www.52nlp.cn/?p=9226

（轉）深度學習主機環境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0

免責聲明！