centos7.6安裝顯卡驅動nvidia和nvidia-docker


安裝前准備:
查看顯卡及系統版本內核信息
cat /etc/centos-release
lshw -numeric -C display
lshw -numeric -C display
yum install pciutils
lspci | grep -i vga
lspci | grep -i nvidia

1、安裝編譯環境:gcc、kernel-devel、kernel-headers("kernel-devel-uname-r == $(uname -r)"可以確保安裝與當前運行內核版本一樣的kernel-header)
yum -y install gcc kernel-devel "kernel-devel-uname-r == $(uname -r)" dkms

2.檢查內核版本和源碼版本,保證一致(如不一致需用yum升級一致)

ls /boot | grep vmlinu

rpm -aq | grep kernel-devel
一致
移除其他版本內核重建內核啟動文件
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
重啟reboot
查看nouveau驅動是否開啟(無命令lsmod可yum安裝)
lsmod | grep  nouveau
屏蔽系統自帶的nouveau

修改dist-blacklist.conf文件:
vim /lib/modprobe.d/dist-blacklist.conf

將nvidiafb注釋掉:
#blacklist nvidiafb

然后添加以下語句:
blacklist nouveau
options nouveau modeset=0

3、重新建立initramfs image文件(生成新的內核,這個內核在開機的時候不會加載nouveau驅動程序)

mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak

dracut /boot/initramfs-$(uname -r).img $(uname -r)
修改運行級別為文本模式

systemctl set-default multi-user.target
重啟
reboot
輸入:lsmod | grep nouveau,沒有任何輸出,則確定nouveau沒有加載



一、安裝NVIDIA顯卡驅動
顯卡驅動程序下載:
https://www.nvidia.cn/drivers/unix/
添加權限+x 安裝
chmod +x
執行
./NVIDIA-Linux-x86_64-455.45.01.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.15.2.el7.x86_64/ --no-drm
 (注意:--no-drm要帶上,要不然安裝過程會報錯ERROR: The nvidia-drm kernel module failed to load. This kernel
 module isrequired for the proper operation of DRM-KMS. If you do not need touse DRM-KMS, you can try to install
 this driver package again withthe '--no-drm' option.)
點擊yes即可安裝完成后,重啟
reboot

輸入nvidia-smi,出現顯卡配置信息,說明NVIDIA驅動安裝成功
Sat Feb 27 15:39:09 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla M40 24GB      Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   38C    P0    66W / 250W |      0MiB / 22945MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

安裝docker服務
安裝依賴:
  yum install -y yum-utils device-mapper-persistent-data lvm2
導入repo文件
   yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
查看看在的版本:
  yum list docker-ce --showduplicates | sort -r
安裝指定版本的docker
    yum install docker-ce-18.09.6-3.el7 docker-ce-cli-18.09.6 containerd.io
啟動docker
 systemctl start docker
 systemctl status docker
 systemctl enable docker
 

安裝nvidia-docker
 參考文獻:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html   官網安裝文檔
https://nvidia.github.io/libnvidia-container/ (FQ可達)

設置key導入repo
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo

清空yum緩存
yum clean expire-cache
重建cache
yum makecache
查找可安裝的nvidia docker版本:
yum search --showduplicates nvidia-docker
安裝nvidia-docker(可指定版本默認安裝最新穩定版)
yum install -y nvidia-docker2
修改daemon.json文件
root@slash:/home/slash# cat  /etc/docker/daemon.json
#注意一定要有default-runtime ,否則k8s里的docker容器運行起來后找不到nvidia-smi
{
   "registry-mirrors": ["https://5twf62k1.mirror.aliyuncs.com"],
   "default-runtime": "nvidia",
   "runtimes": {
       "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
       }
   }
}

尤其是上面的path這個地方需要注意
重啟Docker daemon
 systemctl daemon-reload && systemctl restart docker
驗證docker2
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi  出現以下列表表明安裝成功

執行    nvidia-docker run --rm nvidia/cuda nvidia-smi
Mon Mar  1 02:47:16 2021    
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla M40 24GB      Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   40C    P0    66W / 250W |      0MiB / 22945MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+










使用 nvidia-docker  查看 GPU 信息:
nvidia-docker run --rm nvidia/cuda nvidia-smi














免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM