AI模型運維——NVIDIA驅動、cuda、cudnn、nccl安裝


目前大部分使用GPU的AI模型,都使用的英偉達這套。

需要注意的是,驅動、cuda、cudnn版本需要一一對應,高低版本互不兼容。

驅動和cuda對應關系:https://docs.nvidia.com/deploy/cuda-compatibility/index.html

驅動下載:https://www.nvidia.cn/Download/index.aspx?lang=cn

CUDA下載:https://developer.nvidia.com/cuda-downloads

 

一、NVIDIA驅動安裝

看下是否有nvidia-smi命令,如果沒用就需要安裝驅動

# 卸載驅動,不卸載直接裝應該也行
yum remove xorg-x11-drv-nvidia* nvidia-kmod

# 安裝
rpm -ivh nvidia-diag-driver-local-repo-rhel7-384.183-1.0-1.x86_64.rpm
yum install cuda-drivers

 

二、cuda安裝

cuda

rpm -ivh cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64.rpm
rpm -ivh cuda-repo-rhel7-9-0-local-cublas-performance-update-1.0-1.x86_64.rpm
rpm -ivh cuda-repo-rhel7-9-0-local-cublas-performance-update-2-1.0-1.x86_64.rpm
rpm -ivh cuda-repo-rhel7-9-0-local-cublas-performance-update-3-1.0-1.x86_64.rpm
rpm -ivh cuda-repo-rhel7-9-0-176-local-patch-4-1.0-1.x86_64.rpm

yum install cuda
cat /usr/local/cuda/version.txt

cudnn

tar -xzf cudnn-9.0-linux-x64-v7.4.1.5.tgz
cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

環境變量 .bashrc

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda

 

三、nccl安裝

rpm -ivh nccl-repo-rhel7-2.4.8-ga-cuda9.0-1-1.x86_64.rpm
# yum update
yum install libnccl libnccl-devel libnccl-static

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM