ubuntu20安裝顯卡驅動nvidia-docker,cuda


# lshw -numeric -C display   #查看顯卡數量
  *-display                 
       description: VGA compatible controller
       product: NVIDIA Corporation [10DE:2206]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:0a:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:113 memory:fb000000-fbffffff memory:d0000000-dfffffff memory:e0000000-e1ffffff ioport:f000(size=128) memory:fc000000-fc07ffff
  *-display
       description: VGA compatible controller
       product: NVIDIA Corporation [10DE:2206]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:0b:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:114 memory:f9000000-f9ffffff memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128) memory:fa000000-fa07ffff

第一步 獲取顯卡型號
想辦法獲取自己nvidia顯卡的型號(一般買電腦的時候都會有顯卡型號,我的顯卡型號是在電腦上的一個貼紙上),本人的顯卡是RTX3080。

# lspci -vnn | grep VGA       同理
0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1) (prog-if 00 [VGA controller])
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1) (prog-if 00 [VGA controller])


第二步 查看RTX3080顯卡驅動
去NVDIA driver search page查看支持 RTX3080 顯卡的驅動的最新版本的版本號
驅動程序版本: 455.45 - 發行日期: 2020-11-17
更新軟件源,運行
#apt-get upgrade
#apt-cache search nvidia-* |grep 455 查詢455版本的驅動是否存在
  nvidia-driver-418-server - NVIDIA Server Driver metapackage
  nvidia-driver-440-server - NVIDIA Server Driver metapackage
  nvidia-driver-450-server - NVIDIA Server Driver metapackage
  nvidia-driver-455 - NVIDIA driver metapackage
安裝

# apt-get install nvidia-driver-455 -y
安裝完reboot系統

查看是否安裝成功
# nvidia-smi
Fri Nov 27 09:56:30 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   37C    P8     2W / 320W |    299MiB / 10015MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3080    Off  | 00000000:0B:00.0 Off |                  N/A |
|  0%   33C    P8     4W / 320W |     10MiB / 10018MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1441      G   /usr/lib/xorg/Xorg                 18MiB |
|    0   N/A  N/A      2997      G   /usr/bin/gnome-shell               54MiB |
|    0   N/A  N/A      3833      G   /usr/lib/xorg/Xorg                 94MiB |
|    0   N/A  N/A      3990      G   /usr/bin/gnome-shell              127MiB |
|    1   N/A  N/A      1441      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      2997      G   /usr/bin/gnome-shell                0MiB |
|    1   N/A  N/A      3833      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3990      G   /usr/bin/gnome-shell                0MiB |
表明安裝成功

nvidia-docker安裝 (若無權限可使用sudo方式)
先安裝docker(可根據實際情況安裝)
#apt-get update   (更新ubuntu的apt源索引)

安裝包允許apt通過HTTPS使用倉庫

#apt-get install apt-transport-https ca-certificates curl software-properties-common

添加Docker官方GPG key

#curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

設置Docker穩定版倉庫

#add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu  $(lsb_release -cs)  stable"

添加倉庫后,更新apt源索引

#apt-get update

安裝最新版Docker CE(社區版)

apt-get install docker-ce

配置nvidia-docker
# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
OK

導入官方nvidia鏡像源
#  distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
# curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
#deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH)
#deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH)

更新
# apt update
獲取:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease [1,139 B]
命中:2 https://download.docker.com/linux/ubuntu bionic InRelease                                                                                      
獲取:3 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  InRelease [1,136 B]                     
獲取:4 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease [1,129 B]
獲取:5 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages [9,128 B]
獲取:6 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages [6,148 B]
獲取:7 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages [4,332 B]
命中:8 http://archive.ubuntu.com/ubuntu bionic InRelease
獲取:9 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
獲取:10 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
獲取:11 http://archive.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
已下載 275 kB,耗時 2秒 (129 kB/s)  
正在讀取軟件包列表... 完成
正在分析軟件包的依賴關系樹       
正在讀取狀態信息... 完成       
有 8 個軟件包可以升級。請執行 ‘apt list --upgradable’ 來查看它們。
 安裝nvidia-docker2
# apt install -y nvidia-docker2
# systemctl restart docker

配置docker文件
# cat  /etc/docker/daemon.json
#注意一定要有default-runtime ,否則k8s里的docker容器運行起來后找不到nvidia-smi
{
   "registry-mirrors": ["https://5twf62k1.mirror.aliyuncs.com"],
   "default-runtime": "nvidia",
   "runtimes": {
       "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
       }
   }
}

尤其是上面的path這個地方需要注意
重啟Docker daemon
#  systemctl daemon-reload && systemctl restart docker
驗證docker2
# docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi  出現以下列表表明安裝成功
Fri Nov 27 02:54:19 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   36C    P8     1W / 320W |    202MiB / 10015MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3080    Off  | 00000000:0B:00.0 Off |                  N/A |
|  0%   33C    P8     8W / 320W |     10MiB / 10018MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
=================================================================
如果存在這種現象:
"沒有運行程序,nvidia-smi查看GPU-Util 達到100% GPU利用率很高"
需要把驅動模式設置為常駐內存才可以,設置命令:
root@node3:~#nvidia-smi -pm 1
=================================================================



免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM