ubuntu安裝cuda、cudnn和nvidia-docker



本文參考自 Ubuntu18.04安裝CUDA10.1和cuDNN v7.6.5

安裝前的工作

lspci | grep -i nvidia查看可用的nvidia設備——
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
uname -m && cat /etc/*release知曉操作系統的信息——64位的ubuntu20.04系統

x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

gcc --version檢查是否已安裝gcc——version:(Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
uname -rlinux內核版本——5.8.0-50-generic

要安裝的cuda和cudnn版本說明

根據windows踩坑的情況,rtx1060適配的cuda版本10.1.105_418,cudnn版本10.1v7.6.5.32

安裝cuda

下載好cuda10.1.105_418,由於沒有ubuntu20.04對應的版本,我選擇了18.10包。按照下載頁面執行如下命令:

sudo dpkg -i cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
/*執行第一條命令打印出的內容
Selecting previously unselected package cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39.
(Reading database ... 186150 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39_1.0-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39 (1.0-1) ...
Setting up cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39 (1.0-1) ...

The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/cuda-repo-10-1-local-10.1.105-418.39/7fa2af80.pub
*/
sudo apt-key add /var/cuda-repo-10-1-local-10.1.105-418.39/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

之后重啟

檢查cuda的安裝情況

重啟后執行nvidia-smi獲取顯卡信息。執行nvcc -V,建議“sudo apt install nvidia-cuda-toolkit”,不要如此做,因為本地已有與cuda對應的nvcc程序,從線上安裝nvidia-cuda-toolkit可能造成toolkit與cuda的版本沖突,令cuda環境失效。(我曾經亂在主機上裝nvidia-cuda-toolkit導致nvidia-smi命令無法使用,整個主機無法使用nv顯卡,需要重新裝cuda環境。)
下面將nvcc添加到環境變量中

vim ~/.bashrc
# 添加一行:export PATH="/usr/local/cuda-10.1/bin:$PATH"
source ~/.bashrc

之后執行nvcc -V命令得到結果:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

安裝cudnn

去nv網站下載cudnn-10.1-linux-x64-v7.6.5.32.tgz(cudnn for linux)

tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn* # 所有用戶組賦上讀權限
vim ~/.bashrc
# 添加一行:export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
source ~/.bashrc

安裝nvidia-docker

根據Docker-Getting Started-Installing on Ubuntu and Debian文檔的說明,執行如下命令:

curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
sudo docker images
/*
REPOSITORY    TAG         IMAGE ID       CREATED        SIZE
nvidia/cuda   11.0-base   2ec708416bb8   8 months ago   122MB
*/

在紅米book14上的實踐

參考Win10+MX250+CUDA10.1+cuDNN+Pytorch1.4安裝+測試全過程(吐血),使用的CUDA和cudnn還是這篇博文中用到的軟件。按照本文的操作得到正確結果,中間遇到一個問題:執行nvidia-smi命令報錯“VIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”,在BIOS設定好管理員的密碼關閉安全啟動模式,解決該問題
本文創建於2021年 05月 05日 星期三 19:41:19 CST,修改於2021年7月19日14點44分


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM