在 Linux 主機上配置了很多次 Cuda/CuDNN 的運行環境,在此記錄下用到的腳本命令以復用。
特別提醒,先了解清楚 GPU 卡的型號,查清與主機 Linux 內核兼容的驅動程序、Cuda 和 CuDNN 的發行版。
請以 root 權限執行本文的所有 bash 命令。
1. NVIDIA 驅動安裝
# WIKI: https://download.nvidia.com/XFree86/Linux-x86_64/375.20/README/installdriver.html wget http://us.download.nvidia.com/tesla/384.145/NVIDIA-Linux-x86_64-384.145.run && \ chmod u+x NVIDIA-Linux-x86_64-384.145.run && \ ./NVIDIA-Linux-x86_64-384.145.run --silent --dkms --accept-license
2. 打開持久模式
nvidia-smi -pm ENABLED # WIKI https://docs.nvidia.com/deploy/driver-persistence/index.html
4. GPU 設備信息查看
nvidia-smi # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 384.145 Driver Version: 384.145 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # |===============================+======================+======================| # | 0 Tesla V100-PCIE... Off | 00000000:1A:00.0 Off | 0 | # | N/A 34C P0 37W / 250W | 0MiB / 16152MiB | 0% Default | # +-------------------------------+----------------------+----------------------+ # | 1 Tesla V100-PCIE... Off | 00000000:1F:00.0 Off | 0 | # | N/A 36C P0 36W / 250W | 0MiB / 16152MiB | 0% Default | # +-------------------------------+----------------------+----------------------+ nvidia-smi topo --matrix # 查看拓撲信息 # GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_1 mlx5_0 CPU Affinity # GPU0 X PIX PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU1 PIX X PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU2 PIX PIX X PIX SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU3 PIX PIX PIX X SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU4 SYS SYS SYS SYS X PIX PIX PIX NODE NODE 16-31,48-63 # GPU5 SYS SYS SYS SYS PIX X PIX PIX NODE NODE 16-31,48-63 # GPU6 SYS SYS SYS SYS PIX PIX X PIX NODE NODE 16-31,48-63 # GPU7 SYS SYS SYS SYS PIX PIX PIX X NODE NODE 16-31,48-63 # mlx5_1 SYS SYS SYS SYS NODE NODE NODE NODE X PIX # mlx5_0 SYS SYS SYS SYS NODE NODE NODE NODE PIX X nvidia-smi --id=0 --format=csv --query-gpu=utilization.gpu,memory.used # utilization.gpu [%], memory.used [MiB] # 0 %, 0 MiB
5. CUDA Toolkit 安裝
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run && \ chmod u+x cuda_9.0.176_384.81_linux-run && \ ./cuda_9.0.176_384.81_linux-run --toolkit --silent --verbos cat << EOF >> /etc/ld.so.conf.d/cuda.conf /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 EOF ldconfig cat << EOF >> /etc/profile.d/cuda.sh export PATH=/usr/local/cuda/bin:\$PATH EOF source /etc/profile
5. CuDNN 安裝
# CuDNN 下載需要 Nvidia 賬號。直接訪問以下 URL,會被重定向到登錄頁面。 # https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/Ubuntu16_04-x64/libcudnn7_7.0.5.15-1+cuda9.0_amd64 dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb # 安裝到 /usr/lib/x86_64-linux-gnu