在docker-ubuntu18.04 上安裝 cuda 和 cudnn
其實是安裝 nvidia-docker2 然后 pull 已經 安裝好 cuda 和 cudnn 的 ubuntu18.04的鏡像
環境: Ubuntu18.04 NVIDIA driver build-essential
-
在主機上安裝 NVIDIA driver 官方文檔
-
可執行文件安裝
BASE_URL=https://us.download.nvidia.com/tesla DRIVER_VERSION=450.80.02(需要去選擇合適的驅動版本) curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run等待下載完成
sudo sh NVIDIA-Linux-x86_64-$DRIVER_VERSION.run -
apt 安裝
安裝linux 內核 的頭文件
sudo apt-get install linux-headers-$(uname -r)確保CUDA網絡存儲庫(CUDA network repository)上的包優先於規范存儲庫(Canonical repository)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g') sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600安裝 GPG公鑰(public GPG key)
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub安裝 cuda源倉庫(CUDA network repository)
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.listsudo apt-get update sudo apt-get -y install cuda-drivers
-
-
docker:中安裝cuda 和 cudnn
-
Setting up NVIDIA Container Toolkit
安裝nvidia-docker2 替換 docker,docker不用卸載。
過程中出現的基本都是網絡問題,把curl里面的 -s 去掉 可以看到原因。大多數情況下是DNS解析不了主機地址,訪問nvidia被拒絕。能夠用代理話可能就不會出現這些,沒有代理的話嘗試修改 DNS ,瀏覽器可以訪問 https://nvidia.github.io/nvidia-docker 基本就不會有問題了。distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listTo get access to experimental features such as CUDA on WSL or the new MIG capability on A100, you may want to add the experimental branch to the repository listing:
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.listInstall the nvidia-docker2 package (and dependencies) after updating the package listing:
sudo apt-get update sudo apt-get install -y nvidia-docker2按y切換配置,將舊的docker切換為可以調用GPU的nvidia-docker2
Restart the Docker daemon to complete the installation after setting the default runtime:
sudo systemctl restart docker到 dockerhub
pull所需要cuda 和 cudnn 版本例如:我需要的環境是 cuda10.0 cudnn7 ubunutu18.04
docker pull nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
進入鏡像后 nvidia-smi 顯示和下面一樣就可以了
不知道自己版本測試用的話直接執行下面的命令就好了
At this point, a working setup can be tested by running a base CUDA container:
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smiThis should result in a console output shown below:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
-
