centos 安裝cuda


零 修訂記錄

序號 修訂內容 修訂時間
1 新增 2021/1/20

一 摘要

本文主要介紹centos 8.1 安裝cuda

二 環境信息

(一) 操作系統

[root@ussuritest004 ~]# cat /etc/centos-release
CentOS Linux release 8.1.1911 (Core)
[root@ussuritest004 ~]#

(二) cuda 版本

我這里用的是
cuda_10.2.89_440.33.01_linux.run

三 實施

(一)准備工作

3.1.1 檢查機器是否裝有支持cuda的gpu

[root@ussuritest004 software]# lspci | grep -i nvidia
af:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
[root@ussuritest004 software]#

3.1.2 下載

此處先略

(二) runfile 安裝

3.2.1 安裝基礎依賴

[root@ussuritest004 yum.repos.d]# yum install gcc

這個可以不要

[root@ussuritest004 yum.repos.d]# yum install libglu1-mesa libxi-dev libxmu-dev libglu1-mesa-dev freeglut3-dev

3.2.2 關閉 the Nouveau drivers

3.2.2.1 檢查nouveau 驅動是否啟動

[root@ussuritest004 log]#  lsmod | grep nouveau
nouveau              2215936  1
mxm_wmi                16384  1 nouveau
video                  45056  1 nouveau
wmi                    32768  2 mxm_wmi,nouveau
i2c_algo_bit           16384  2 ast,nouveau
drm_kms_helper        217088  2 ast,nouveau
ttm                   110592  2 ast,nouveau
drm                   524288  7 drm_kms_helper,ast,ttm,nouveau
[root@ussuritest004 log]#

有輸出表示啟動了。

3.2.2.2 關閉nouveau 驅動

3.2.2.2.1 新增黑名單

To disable the Nouveau drivers, creating a file at "/usr/lib/modprobe.d/blacklist-nouveau.conf" with following content:

blacklist nouveau

options nouveau modeset=0
[root@ussuritest004 ~]# ll /usr/lib/modprobe.d/blacklist-nouveau.conf
ls: cannot access '/usr/lib/modprobe.d/blacklist-nouveau.conf': No such file or directory
[root@ussuritest004 ~]# vim /usr/lib/modprobe.d/blacklist-nouveau.conf
[root@ussuritest004 ~]#

[root@ussuritest004 ~]# cat /usr/lib/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
[root@ussuritest004 ~]#

3.2.2.2.2 重新生成 kernel inittramfs

先備份

[root@ussuritest004 boot]# uname -r
4.18.0-147.el8.x86_64
[root@ussuritest004 boot]# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak.orig
[root@ussuritest004 boot]# ll
total 167724
-rw-------. 1 root root  3838259 Dec  5  2019 System.map-4.18.0-147.el8.x86_64
-rw-r--r--. 1 root root   184613 Dec  5  2019 config-4.18.0-147.el8.x86_64
drwxr-xr-x. 3 root root     4096 Jan 19 11:44 efi
drwx------. 4 root root     4096 Jan 19 15:02 grub2
-rw-------. 1 root root 71694380 Jan 19 11:49 initramfs-0-rescue-c7dcb861dc20453f8e275d6036842581.img
-rw-------. 1 root root 30310567 Jan 19 11:50 initramfs-4.18.0-147.el8.x86_64.img
-rw-------  1 root root 30310567 Jan 20 13:50 initramfs-4.18.0-147.el8.x86_64.img.bak.orig
-rw-------. 1 root root 19141009 Jan 19 11:57 initramfs-4.18.0-147.el8.x86_64kdump.img
drwxr-xr-x. 3 root root     4096 Jan 19 11:47 loader
drwx------. 2 root root    16384 Jan 19 11:28 lost+found
-rwxr-xr-x. 1 root root  8106744 Jan 19 11:48 vmlinuz-0-rescue-c7dcb861dc20453f8e275d6036842581
-rwxr-xr-x. 1 root root  8106744 Dec  5  2019 vmlinuz-4.18.0-147.el8.x86_64
[root@ussuritest004 boot]#

再重新生成

[root@ussuritest004 boot]# dracut  /boot/initramfs-$(uname -r).img --force
[root@ussuritest004 boot]# ll
total 166988
-rw-------. 1 root root  3838259 Dec  5  2019 System.map-4.18.0-147.el8.x86_64
-rw-r--r--. 1 root root   184613 Dec  5  2019 config-4.18.0-147.el8.x86_64
drwxr-xr-x. 3 root root     4096 Jan 19 11:44 efi
drwx------. 4 root root     4096 Jan 19 15:02 grub2
-rw-------. 1 root root 71694380 Jan 19 11:49 initramfs-0-rescue-c7dcb861dc20453f8e275d6036842581.img
-rw-------. 1 root root 29560525 Jan 20 13:53 initramfs-4.18.0-147.el8.x86_64.img
-rw-------  1 root root 30310567 Jan 20 13:50 initramfs-4.18.0-147.el8.x86_64.img.bak.orig
-rw-------. 1 root root 19141009 Jan 19 11:57 initramfs-4.18.0-147.el8.x86_64kdump.img
drwxr-xr-x. 3 root root     4096 Jan 19 11:47 loader
drwx------. 2 root root    16384 Jan 19 11:28 lost+found
-rwxr-xr-x. 1 root root  8106744 Jan 19 11:48 vmlinuz-0-rescue-c7dcb861dc20453f8e275d6036842581
-rwxr-xr-x. 1 root root  8106744 Dec  5  2019 vmlinuz-4.18.0-147.el8.x86_64
[root@ussuritest004 boot]#

3.2.2.3 運行級別修改為文本模式

[root@ussuritest004 boot]# systemctl set-default multi-user.target
Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.
[root@ussuritest004 boot]#

修改完重啟機器

3.2.3 安裝cuda_10.2.89_440.33.01_linux.run

3.2.3.1 step by step

[root@ussuritest004 software]# sh cuda_10.2.89_440.33.01_linux.run

該命令執行后需要等一段時間
輸入accept

選擇install

裝失敗了
報錯日志

[root@ussuritest004 log]# cat nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Wed Jan 20 14:59:43 2021
installer version: 440.33.01

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

nvidia-installer command line:
    ./nvidia-installer
    --ui=none
    --no-questions
    --accept-license
    --disable-nouveau
    --no-cc-version-check
    --install-libglvnd

Using built-in stream user interface
-> Detected 48 CPUs online; setting concurrency level to 32.
-> Installing NVIDIA driver version 440.33.01.
WARNING: One or more modprobe configuration files to disable Nouveau are already present at: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf.  Please be sure you have rebooted your system since these files were written.  If you have rebooted, then Nouveau may be enabled for other reasons, such as being included in the system initial ramdisk or in your X configuration file.  Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
-> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration directory.  Would you like nvidia-installer to attempt to create this modprobe file for you? (Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written.  For some distributions, this may be sufficient to disable Nouveau; other distributions may require modification of the initial ramdisk.  Please reboot your system and attempt NVIDIA driver installation again.  Note if you later wish to reenable Nouveau, you will need to delete these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
ERROR: Unable to find the development tool `make` in your path; please make sure that you have the package 'make' installed.  If make is installed on your system, then please check that `make` is in your PATH.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
[root@ussuritest004 log]#


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM