目前大部分使用GPU的AI模型,都使用的英偉達這套。
需要注意的是,驅動、cuda、cudnn版本需要一一對應,高低版本互不兼容。
驅動和cuda對應關系:https://docs.nvidia.com/deploy/cuda-compatibility/index.html
驅動下載:https://www.nvidia.cn/Download/index.aspx?lang=cn
CUDA下載:https://developer.nvidia.com/cuda-downloads
一、NVIDIA驅動安裝
看下是否有nvidia-smi命令,如果沒用就需要安裝驅動
# 卸載驅動,不卸載直接裝應該也行 yum remove xorg-x11-drv-nvidia* nvidia-kmod # 安裝 rpm -ivh nvidia-diag-driver-local-repo-rhel7-384.183-1.0-1.x86_64.rpm yum install cuda-drivers
二、cuda安裝
cuda
rpm -ivh cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64.rpm rpm -ivh cuda-repo-rhel7-9-0-local-cublas-performance-update-1.0-1.x86_64.rpm rpm -ivh cuda-repo-rhel7-9-0-local-cublas-performance-update-2-1.0-1.x86_64.rpm rpm -ivh cuda-repo-rhel7-9-0-local-cublas-performance-update-3-1.0-1.x86_64.rpm rpm -ivh cuda-repo-rhel7-9-0-176-local-patch-4-1.0-1.x86_64.rpm yum install cuda cat /usr/local/cuda/version.txt
cudnn
tar -xzf cudnn-9.0-linux-x64-v7.4.1.5.tgz cp cuda/include/cudnn.h /usr/local/cuda/include cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
環境變量 .bashrc
export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/cuda
三、nccl安裝
rpm -ivh nccl-repo-rhel7-2.4.8-ga-cuda9.0-1-1.x86_64.rpm # yum update yum install libnccl libnccl-devel libnccl-static