系統為:
centos7.8(虛擬機)
遇到的問題
1、nouveau :failed to create kernel chanel,-22
關閉nouveau
vi /etc/modprobe.d/blacklist-nouveau.conf INSERT KEY blacklist nouveau options nouveau modeset=0 ESC-BUTTON :wq
重建 initramfs image
備份
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
重建
dracut /boot/initramfs-$(uname -r).img $(uname -r)
重啟系統
reboot
查看nouveau是否已經禁用,沒有輸出就對了
lsmod| grep nouveau
2、NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
上面問題有可能是因為驅動版本不對造成的,下面我們看看如何獲取驅動版本
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install nvidia-detect
經過測試默然在官方下載默認驅動版本可能沒有檢測出來的高(https://www.nvidia.com/Download/index.aspx?lang=en-us)
[root@10-64-2-16 ~]# nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:1eb8] NVIDIA Corporation TU104GL [Tesla T4]
This device requires the current 470.86 NVIDIA driver kmod-nvidia
[1013:00b8] Cirrus Logic GD 5446
安裝kernel-devel 包
yum install -y kernel-devel
3.10.0-1160.49.1.el7.x86_64 是我安裝的kernel-devel版本
[root@10-64-2-16 ~]# ls /usr/src/kernels/
3.10.0-1160.49.1.el7.x86_64 3.10.0-1160.49.1.el7.x86_64.debug
[root@10-64-2-16 modules]# cd /lib/modules/$(uname -r)
[root@10-64-2-16 3.10.0-1127.el7.x86_64]# rm -f build
[root@10-64-2-163.10.0-1127.el7.x86_64]#ln -s /usr/src/kernels/3.10.0-1160.49.1.el7.x86_6 build
安裝驅動
sh NVIDIA-Linux-x86_64-470.86.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.49.1.el7.x86_6/build