配置 Nvidia GPU 主機的運行環境


在 Linux 主機上配置了很多次 Cuda/CuDNN 的運行環境,在此記錄下用到的腳本命令以復用。

特別提醒,先了解清楚 GPU 卡的型號,查清與主機 Linux 內核兼容的驅動程序、Cuda 和 CuDNN 的發行版。

請以 root 權限執行本文的所有 bash 命令。

1. NVIDIA 驅動安裝

# WIKI: https://download.nvidia.com/XFree86/Linux-x86_64/375.20/README/installdriver.html 
wget http://us.download.nvidia.com/tesla/384.145/NVIDIA-Linux-x86_64-384.145.run && \
chmod u+x NVIDIA-Linux-x86_64-384.145.run && \
./NVIDIA-Linux-x86_64-384.145.run --silent --dkms --accept-license

2. 打開持久模式

nvidia-smi -pm ENABLED # WIKI https://docs.nvidia.com/deploy/driver-persistence/index.html

 4. GPU 設備信息查看

nvidia-smi
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 384.145                Driver Version: 384.145                   |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
# |===============================+======================+======================|
# |   0  Tesla V100-PCIE...  Off  | 00000000:1A:00.0 Off |                    0 |
# | N/A   34C    P0    37W / 250W |      0MiB / 16152MiB |      0%      Default |
# +-------------------------------+----------------------+----------------------+
# |   1  Tesla V100-PCIE...  Off  | 00000000:1F:00.0 Off |                    0 |
# | N/A   36C    P0    36W / 250W |      0MiB / 16152MiB |      0%      Default |
# +-------------------------------+----------------------+----------------------+

nvidia-smi topo --matrix # 查看拓撲信息
#         GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_1  mlx5_0  CPU Affinity
# GPU0     X      PIX     PIX     PIX     SYS     SYS     SYS     SYS     SYS     SYS     0-15,32-47
# GPU1    PIX      X      PIX     PIX     SYS     SYS     SYS     SYS     SYS     SYS     0-15,32-47
# GPU2    PIX     PIX      X      PIX     SYS     SYS     SYS     SYS     SYS     SYS     0-15,32-47
# GPU3    PIX     PIX     PIX      X      SYS     SYS     SYS     SYS     SYS     SYS     0-15,32-47
# GPU4    SYS     SYS     SYS     SYS      X      PIX     PIX     PIX     NODE    NODE    16-31,48-63
# GPU5    SYS     SYS     SYS     SYS     PIX      X      PIX     PIX     NODE    NODE    16-31,48-63
# GPU6    SYS     SYS     SYS     SYS     PIX     PIX      X      PIX     NODE    NODE    16-31,48-63
# GPU7    SYS     SYS     SYS     SYS     PIX     PIX     PIX      X      NODE    NODE    16-31,48-63
# mlx5_1  SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE     X      PIX
# mlx5_0  SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    PIX      X

nvidia-smi --id=0 --format=csv --query-gpu=utilization.gpu,memory.used
# utilization.gpu [%], memory.used [MiB]
# 0 %, 0 MiB

 5. CUDA Toolkit 安裝

wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run && \
chmod u+x cuda_9.0.176_384.81_linux-run && \
./cuda_9.0.176_384.81_linux-run --toolkit --silent --verbos
cat << EOF >> /etc/ld.so.conf.d/cuda.conf
/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
EOF
ldconfig
cat << EOF >> /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:\$PATH
EOF
source /etc/profile

5. CuDNN 安裝

# CuDNN 下載需要 Nvidia 賬號。直接訪問以下 URL,會被重定向到登錄頁面。
# https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/Ubuntu16_04-x64/libcudnn7_7.0.5.15-1+cuda9.0_amd64
dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb # 安裝到 /usr/lib/x86_64-linux-gnu

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM