gpu 服務器安裝GPU驅動和CUDA工具包(nvidia)

本文轉載自查看原文 2021-04-26 16:33 207 系統|運維

安裝GPU驅動和CUDA工具包(nvidia)

環境
顯卡型號： GPU 2080 ti *8
操作系統： CentOS Linux release 7.8.2003 (Core)
docker 版本： 20.10.6 （18 版本不支持gpu）
軟件下載
nvidia驅動
官方地址：https://www.nvidia.com/en-us/drivers/unix/
找到 Latest Long Lived Branch Version（長期支持版）

cuda工具包
官方地址：https://developer.nvidia.com/cuda-downloads

升級內核

# 安裝yum源
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

# 查看列表
yum --disablerepo=* --enablerepo=elrepo-kernel repolist
yum --disablerepo=* --enablerepo=elrepo-kernel list kernel*


# 安裝
yum --enablerepo=elrepo-kernel install kernel-ml-devel kernel-ml -y


# 設置生成新的grub
grub2-set-default 0
grub2-mkconfig -o /etc/grub2.cfg


# 移除舊版本工具包
yum remove kernel-tools-libs.x86_64 kernel-tools.x86_64 -y

# 安裝新版本
yum --disablerepo=* --enablerepo=elrepo-kernel install -y kernel-ml-tools.x86_64


# 重啟
reboot

# 查看內核版本
uname -sr

安裝NVIDIA驅動和CUDA工具包

- 環境依賴
shell> wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
shell> yum install -y gcc dkms

- 禁用nouveau
shell> echo -e "blacklist nouveau\noptions nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist.conf
shell> mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
shell> dracut /boot/initramfs-$(uname -r).img $(uname -r)

- 修改 /etc/default/grub，在 GRUB_CMDLINE_LINUX 添加 rdblacklist=nouveau，並重啟
shell> sed -i 's/quiet/& rdblacklist=nouveau/' /etc/default/grub
shell> grub2-mkconfig -o /boot/grub2/grub.cfg
shell> reboot

- 首次安裝Nvidia驅動
shell> bash NVIDIA-Linux-x86_64-450.66.run

安裝過程中一些選項

1、問題：Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? 
選擇 No 繼續。 
2、問題：CC version check failed

選擇 Abort installation 繼續。

解決gcc版本問題

shell> gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
shell> yum -y install centos-release-scl
shell> yum list |grep gcc |grep sclo
shell> yum install -y devtoolset-9-gcc*
 
shell> scl enable devtoolset-9 bash
[root@YingPuOS src]# gcc --version
gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

再次安裝Nvidia驅動

shell> bash NVIDIA-Linux-x86_64-450.66.run
shell> exit

安裝過程中一些選項：

1、問題：Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? 
選擇 No 繼續。 
2、問題：Nvidia’s 32-bit compatibility libraries? 
選擇 No 繼續。 
3、問題：The distribution-provided pre-install script failed! Are you sure you want to continue? 
選擇 yes 繼續。 
4、問題：Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 
選擇 Yes 繼續

5、問題：WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were 
not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org 
SDK/development package for your distribution and reinstall the driver.  

選擇ok繼續

安裝CUDA

shell> bash cuda_11.0.3_450.51.06_linux.run

開啟 persistence-mode 模式

shell> /usr/bin/nvidia-persistenced --persistence-mode
shell> echo "/usr/bin/nvidia-persistenced --persistence-mode" >> /etc/rc.d/rc.local

查看GPU使用情況

設置NVIDIA Container Toolkit

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo

#更新軟件包清單后，安裝軟件包（和依賴項）：
yum clean expire-cache

yum install -y nvidia-docker2

# cat  /etc/docker/daemon.json 
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "insecure-registries": ["xxxxxxxxxxxxx"]
}

#設置默認運行時后，重新啟動Docker守護程序以完成安裝：
systemctl restart docker

#可以通過運行基本CUDA容器來測試工作設置：
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 工作站服務器CentOS安裝Nvidia RTX3080/3090 GPU顯卡驅動 GPU驅動安裝&cuda Ubuntu19.10安裝NVIDIA驅動,CUDA,實現GPU加速 GPU 服務器cuda out of memory 深度學習服務器完整配置手冊（三、GPU顯卡cuda和驅動一起安裝，docker安裝）記：第一次更新服務器CUDA和GPU驅動 Linux(CentOS)下安裝NVIDIA GPU驅動 CentOS 7.6安裝 NVIDIA 顯卡驅動、cuda以及gpu_burn並測試 conda 安裝GPU——CUDA linux下NVIDIA GPU驅動安裝最簡方式