ubuntu20安装显卡驱动nvidia-docker,cuda


# lshw -numeric -C display   #查看显卡数量
  *-display                 
       description: VGA compatible controller
       product: NVIDIA Corporation [10DE:2206]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:0a:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:113 memory:fb000000-fbffffff memory:d0000000-dfffffff memory:e0000000-e1ffffff ioport:f000(size=128) memory:fc000000-fc07ffff
  *-display
       description: VGA compatible controller
       product: NVIDIA Corporation [10DE:2206]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:0b:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:114 memory:f9000000-f9ffffff memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128) memory:fa000000-fa07ffff

第一步 获取显卡型号
想办法获取自己nvidia显卡的型号(一般买电脑的时候都会有显卡型号,我的显卡型号是在电脑上的一个贴纸上),本人的显卡是RTX3080。

# lspci -vnn | grep VGA       同理
0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1) (prog-if 00 [VGA controller])
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1) (prog-if 00 [VGA controller])


第二步 查看RTX3080显卡驱动
去NVDIA driver search page查看支持 RTX3080 显卡的驱动的最新版本的版本号
驱动程序版本: 455.45 - 发行日期: 2020-11-17
更新软件源,运行
#apt-get upgrade
#apt-cache search nvidia-* |grep 455 查询455版本的驱动是否存在
  nvidia-driver-418-server - NVIDIA Server Driver metapackage
  nvidia-driver-440-server - NVIDIA Server Driver metapackage
  nvidia-driver-450-server - NVIDIA Server Driver metapackage
  nvidia-driver-455 - NVIDIA driver metapackage
安装

# apt-get install nvidia-driver-455 -y
安装完reboot系统

查看是否安装成功
# nvidia-smi
Fri Nov 27 09:56:30 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   37C    P8     2W / 320W |    299MiB / 10015MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3080    Off  | 00000000:0B:00.0 Off |                  N/A |
|  0%   33C    P8     4W / 320W |     10MiB / 10018MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1441      G   /usr/lib/xorg/Xorg                 18MiB |
|    0   N/A  N/A      2997      G   /usr/bin/gnome-shell               54MiB |
|    0   N/A  N/A      3833      G   /usr/lib/xorg/Xorg                 94MiB |
|    0   N/A  N/A      3990      G   /usr/bin/gnome-shell              127MiB |
|    1   N/A  N/A      1441      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      2997      G   /usr/bin/gnome-shell                0MiB |
|    1   N/A  N/A      3833      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3990      G   /usr/bin/gnome-shell                0MiB |
表明安装成功

nvidia-docker安装 (若无权限可使用sudo方式)
先安装docker(可根据实际情况安装)
#apt-get update   (更新ubuntu的apt源索引)

安装包允许apt通过HTTPS使用仓库

#apt-get install apt-transport-https ca-certificates curl software-properties-common

添加Docker官方GPG key

#curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

设置Docker稳定版仓库

#add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu  $(lsb_release -cs)  stable"

添加仓库后,更新apt源索引

#apt-get update

安装最新版Docker CE(社区版)

apt-get install docker-ce

配置nvidia-docker
# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
OK

导入官方nvidia镜像源
#  distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
# curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
#deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH)

更新
# apt update
获取:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease [1,139 B]
命中:2 https://download.docker.com/linux/ubuntu bionic InRelease                                                                                      
获取:3 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  InRelease [1,136 B]                     
获取:4 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease [1,129 B]
获取:5 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages [9,128 B]
获取:6 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages [6,148 B]
获取:7 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages [4,332 B]
命中:8 http://archive.ubuntu.com/ubuntu bionic InRelease
获取:9 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
获取:10 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
获取:11 http://archive.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
已下载 275 kB,耗时 2秒 (129 kB/s)  
正在读取软件包列表... 完成
正在分析软件包的依赖关系树       
正在读取状态信息... 完成       
有 8 个软件包可以升级。请执行 ‘apt list --upgradable’ 来查看它们。
 安装nvidia-docker2
# apt install -y nvidia-docker2
# systemctl restart docker

配置docker文件
# cat  /etc/docker/daemon.json
#注意一定要有default-runtime ,否则k8s里的docker容器运行起来后找不到nvidia-smi
{
   "registry-mirrors": ["https://5twf62k1.mirror.aliyuncs.com"],
   "default-runtime": "nvidia",
   "runtimes": {
       "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
       }
   }
}

尤其是上面的path这个地方需要注意
重启Docker daemon
#  systemctl daemon-reload && systemctl restart docker
验证docker2
# docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi  出现以下列表表明安装成功
Fri Nov 27 02:54:19 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   36C    P8     1W / 320W |    202MiB / 10015MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3080    Off  | 00000000:0B:00.0 Off |                  N/A |
|  0%   33C    P8     8W / 320W |     10MiB / 10018MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
=================================================================
如果存在这种现象:
"没有运行程序,nvidia-smi查看GPU-Util 达到100% GPU利用率很高"
需要把驱动模式设置为常驻内存才可以,设置命令:
root@node3:~#nvidia-smi -pm 1
=================================================================



免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM