一、環境信息
1、操作系統:CentOS Linux release 7.4.1708 (Core);
2、顯卡:NVIDIA GTX1080Ti 11G;
二、安裝NVIDIA顯卡驅動
1、在官網上http://www.geforce.cn/drivers搜索到對應型號的顯卡驅動並下載,下載到的驅動文件是一個后綴名為.run的文件(例如NVIDIA-Linux-x86_64-384.98.run);
2、安裝gcc編譯環境以及內核相關的包:
yum install kernel-devel kernel-doc kernel-headers gcc\* glibc\* glibc-\*
注意:安裝內核包時需要先檢查一下當前內核版本是否與所要安裝的kernel-devel/kernel-doc/kernel-headers的版本一致,請務必保持兩者版本一致,否則后續的編譯過程會出問題。
# 查看當前內核版本
[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# yum list | grep kernel-
kernel-devel.x86_64 3.10.0-693.11.1.el7 @updates
kernel-doc.noarch 3.10.0-693.11.1.el7 @updates
kernel-headers.x86_64 3.10.0-693.11.1.el7 @updates
kernel-tools.x86_64 3.10.0-693.11.1.el7 @updates
兩種方法可以解決版本不一致的問題:
方法一、升級內核版本,具體升級方法請自行百度;
方法二、安裝與內核版本一致的kernel-devel/kernel-doc/kernel-headers,例如:
yum install "kernel-devel-uname-r == $(uname -r)"
3、禁用系統默認安裝的 nouveau 驅動,修改/etc/modprobe.d/blacklist.conf 文件:
# 修改配置 echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf # 備份原來的鏡像文件 mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak # 重建新鏡像文件 dracut /boot/initramfs-$(uname -r).img $(uname -r) # 重啟 reboot # 查看nouveau是否啟動,如果結果為空即為禁用成功 lsmod | grep nouveau
4、安裝DKMS模塊
DKMS全稱是DynamicKernel ModuleSupport,它可以幫我們維護內核外的驅動程序,在內核版本變動之后可以自動重新生成新的模塊。
# 下載安裝包 wget http://rpmfind.net/linux/fedora-secondary/releases/25/Everything/aarch64/os/Packages/d/dkms-2.2.0.3-34.git.9e0394d.fc25.noarch.rpm # 安裝 rpm -ivh dkms-2.2.0.3-34.git.9e0394d.fc25.noarch.rpm
5、執行顯卡驅動安裝腳本(如果內核版本一致,就不需要指定--kernel-source-path和-k)
./NVIDIA-Linux-x86_64-384.98.run --kernel-source-path=/usr/src/kernels/3.10.0-693.11.1.el7.x86_64/ -k $(uname -r) --dkms -s
6、若步驟5執行過程中沒報錯,則安裝成功。重啟,執行nvidia-smi可查看相關信息。
7、遇到的問題:
ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option. # 解決方法 指定--kernel-source-path選項,例如: ./NVIDIA-Linux-x86_64-384.98.run --kernel-source-path=/usr/src/kernels/3.10.0-693.11.1.el7.x86_64/
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release. # 解決方法 指定-k選項 $(uname -r),例如: ./NVIDIA-Linux-x86_64-384.98.run --kernel-source-path=/usr/src/kernels/3.10.0-693.11.1.el7.x86_64/ -k $(uname -r)
ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the ow to correctly disable the Nouveau kernel driver. # 解決方法 禁用Nouveau,參見步驟3。
ERROR: Failed to find dkms on the system! ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information. # 解決方法 安裝DKMS模塊,參見步驟4。
三、安裝CUDA8.0
1、在官網上(https://developer.nvidia.com/cuda-80-ga2-download-archive)下載CUDA,三種方式任選,我選擇rpm包的方式:

2、安裝
rpm -i cuda-repo-rhel7-8-0-local-ga2-8.0.61-1.x86_64.rpm
yum clean all
yum install cuda
3、配置環境變量
vi ~/.bash_profile # 添加下面語句 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64 # 使配置生效 source ~/.bash_profile
4、驗證是否安裝成功
# 進入CUDA Sample目錄 cd /usr/local/cuda-8.0/samples/ # 編譯 make # 運行示例腳本 cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery ./deviceQuery
如果CUDA安裝成功,並且配置正確,則會打印出如下圖所示的顯卡相關信息:

四、安裝cuDNN5.1
cuDNN(CUDA Deep Neural Network),是專門針對深度學習框架設計的一套GPU計算加速方案,相比標准的CUDA,它在一些常用的神經網絡操作上進行了性能的優化,比如卷積,pooling,歸一化,以及激活層等等,詳細可以參考官網上的介紹。
1、從官網上(https://developer.nvidia.com/cudnn)下載相關版本的CUDNN(需要先注冊賬號才能下載):
注意:要選擇CUDA相對應版本的。

2、解壓並拷貝到系統目錄下:
tar xzvf cudnn-8.0-linux-x64-v5.1.tgz cp cuda/include/cudnn.h /usr/local/cuda/include cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
3、驗證是否安裝成功
從官網上下載示例程序Code Samples並解壓:
# 進入示例目錄 cd /usr/src/cudnn_samples_v5/mnistCUDNN # 編譯示例程序 make clean && make # 運行 ./mnistCUDNN
如果安裝成功,則會看到打印一些相關信息(太長就不貼出來了),最后會顯示Test passed!
