淺談 docker 掛載 GPU 原理
基礎知識
對於 Docker 等大多數 Linux 容器來說,Cgroups 技術是用來制造約束的主要手段,而 Namespace 技術則是用來修改進程視圖的主要方法。
Docker 啟動的只是一個進程而已,而不是別的。
參考:
隔離(Namespace)
寫代碼調用 clone 的時候,傳入 CLONE_NEWPID/CLONE_NEWNS/CLONE_NEWUTS/CLONE_NEWNET/CLONE_NEWIPC 等就可以啟動一個被隔離的進程
簡單來說 Namespace 是一個障眼法:
- PID Namespace
- Mount 只能看到當前 Namespace 中的掛載點信息
- UTS
- IPC
- Network 只能看到當前 Namespace 中的網絡設備
- User
- 時間是不可以 Namespace 化,即在某個容器內修改了系統時間,該 host 上所有 container 和 host 的系統時間都將被改變
限制(Cgroup)
Linux Control Group。它最主要的作用,就是限制一個進程組能夠使用的資源上限,包括 CPU、內存、磁盤、網絡帶寬等等。
Cgroups 給用戶暴露出來的操作接口是文件系統,即它以文件和目錄的方式組織在操作系統的 /sys/fs/cgroup 路徑下。
啟動容器時填寫:
docker run -it --cpu-period=100000 --cpu-quota=20000 ubuntu /bin/bash
在啟動這個容器后,我們可以通過查看 Cgroups 文件系統下,CPU 子系統中,“docker” 這個控制組里的資源限制文件的內容來確認:
$ cat /sys/fs/cgroup/cpu/docker/5d5c9f67d/cpu.cfs_period_us
100000
$ cat /sys/fs/cgroup/cpu/docker/5d5c9f67d/cpu.cfs_quota_us
20000
掛載 GPU 實驗
使用 nvidia-docker2
簡言之,使用 nvidia-docker2
,可以不費吹灰之力就能使用到 GPU,僅僅需要配置 runtime 使用 nvidia
cat /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"exec-opts": ["native.cgroupdriver=systemd"]
}
啟動容器之后,運行 nvidia-smi 能看到所有的 GPU 卡:
[root@localhost] docker run -it 98b41a1e975d bash
root@6db1dd28459d:/notebooks# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 40C P0 57W / 300W | 4053MiB / 16130MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:8B:00.0 Off | 0 |
| N/A 38C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:8C:00.0 Off | 0 |
| N/A 42C P0 46W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:8D:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:B3:00.0 Off | 0 |
| N/A 39C P0 42W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:B4:00.0 Off | 0 |
| N/A 41C P0 57W / 300W | 7279MiB / 16130MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:B5:00.0 Off | 0 |
| N/A 40C P0 45W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:B6:00.0 Off | 0 |
| N/A 41C P0 44W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
通過 NVIDIA_DRIVER_CAPABILITIES
可以加入部分的 library。通過 NVIDIA_VISIBLE_DEVICES
可以只使用某些 GPU 卡,具體請參考 如何通過 nvidia-docker 通過環境變量配置資源
[root@localhost cuda-9.0]# docker run -it --env NVIDIA_DRIVER_CAPABILITIES="compute,utility" --env NVIDIA_VISIBLE_DEVICES=0,1 98b41a1e975d bash
root@97bf127ff83a:/notebooks# nvidia-smi
Tue Oct 15 09:29:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 39C P0 57W / 300W | 4053MiB / 16130MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:8B:00.0 Off | 0 |
| N/A 37C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
原生 docker 使用 GPU
原生 docker 使用 GPU 遇到了很多坑,首先需要將 runtime 換回 default 值:
[root@localhost ~]# cat /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
重啟 docker 服務后,嘗試直接掛載 GPU:
docker run --device /dev/nvidia0:/dev/nvidia0 -it 98b41a1e975d bash
root@a85d5e5f69d9:/notebooks# nvidia-smi
bash: nvidia-smi: command not found
root@a85d5e5f69d9:/notebooks# ll /dev/|grep nvidia
crw-rw-rw- 1 root root 195, 0 Oct 15 06:06 nvidia0
nvidia-smi
不存在,那么我們可以把宿主機中的 nvidia-smi
所在目錄直接映射進去:
[root@localhost cuda-9.0]# docker run --device /dev/nvidia0:/dev/nvidia0 -v /usr/bin/:/usr/bin -it 98b41a1e975d bash
root@cf29b4477304:/notebooks# nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
libnvidia-ml.so
找不到,libnvidia-ml.so
實際上是 Nvidia Management Library
庫(簡稱 NVML
庫),它屬於 Nvidia Driver
的范疇。nvidia-smi
通過調用 libnvidia-ml.so
來管理 GPU。因此我們需要把它也掛載進去:
[root@localhost cuda-9.0]# docker run --device /dev/nvidia0:/dev/nvidia0 -v /usr/bin/:/usr/bin -v /usr/lib64:/usr/lib64 -it 98b41a1e975d bash
root@ee39b2b3b1a4:/notebooks# nvidia-smi
Failed to initialize NVML: Unknown Error
Failed to initialize NVML: Unknown Error
出現了初始化 NVML
失敗的問題,NVML
庫會和 Nvidia Driver
通信,會不會是通信受阻?於是查看 Nvidia 內核模塊有哪些,是否需要將其全部映射進容器?
[root@localhost cuda-9.0]# lsmod|grep nvidia
nvidia_drm 39843 0
nvidia_modeset 1036498 1 nvidia_drm
nvidia_uvm 786729 0
nvidia 16594443 77 nvidia_modeset,nvidia_uvm
ipmi_msghandler 46608 3 ipmi_devintf,nvidia,ipmi_si
drm_kms_helper 163265 2 ast,nvidia_drm
drm 370825 5 ast,ttm,drm_kms_helper,nvidia_drm
i2c_core 40756 6 ast,drm,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
[root@localhost cuda-9.0]# ll /dev/|grep nvidia
crw-rw-rw- 1 root root 195, 0 Jul 23 10:56 nvidia0
crw-rw-rw- 1 root root 195, 1 Jul 23 10:56 nvidia1
crw-rw-rw- 1 root root 195, 2 Jul 23 10:56 nvidia2
crw-rw-rw- 1 root root 195, 3 Jul 23 10:56 nvidia3
crw-rw-rw- 1 root root 195, 4 Jul 23 10:56 nvidia4
crw-rw-rw- 1 root root 195, 5 Jul 23 10:56 nvidia5
crw-rw-rw- 1 root root 195, 6 Jul 23 10:56 nvidia6
crw-rw-rw- 1 root root 195, 7 Jul 23 10:56 nvidia7
crw-rw-rw- 1 root root 195, 255 Jul 23 10:56 nvidiactl
crw-rw-rw- 1 root root 195, 254 Jul 23 10:56 nvidia-modeset
crw-rw-rw- 1 root root 237, 0 Jul 23 10:56 nvidia-uvm
crw-rw-rw- 1 root root 237, 1 Jul 23 10:56 nvidia-uvm-tools
綜上,我們可以再次嘗試,把 /dev/nvidiactl
、/dev/nvidia-uvm
、/dev/nvidia-uvm-tools
、/dev/nvidia-modeset
全部映射進去:
[root@localhost cuda-9.0]# docker run --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools --device /dev/nvidia-modeset:/dev/nvidia-modeset -v /usr/bin/:/usr/bin -v /usr/lib64:/usr/lib64 -it 98b41a1e975d bash
root@bc21e395d885:/notebooks# nvidia-smi
Tue Oct 15 09:47:26 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 37C P0 44W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
終於得到了我們期望的內容,這個探索的過程引起了我深深的思考,nvidia-docker
是如何做到的?莫非也是 --device
+ 映射 nvidia driver
來實現的?
nvidia-docker 原理
首先我們參考了:
和我們猜測的一樣,nvidia-docker
確實是這么做的,nvidia-container-runtime 封裝了 runc
,在容器啟動之前會調用 pre-start hook
,這個 hook 會調用 nvidia-container-cli,nvidia-container-cli 會分析出需要映射的 GPU 設備、庫文件、可執行文件,在容器啟動后掛載到容器內部,達到配置好 GPU 環境的目的。
安裝 Nvidia driver 驅動
因為在測試的過程中遇到了很多問題,首先就是對 Nvidia 提供的各種驅動不熟悉,不知道他們屬於哪一層,導致有些亂,這里整理了下。
Nvidia GPU 相關驅動包含兩類:
- Nvidia driver
- CUDA Toolkit
Nvidia driver
安裝方法:
- 下載這么一個東西
NVIDIA-Linux-x86_64-384.59.run
然后直接安裝,安裝后所有的文件默認在/usr/local/nvidia
下,這也是為什么大多數教程上docker -v /usr/local/nvidia:/usr/local/nvidia
的原因 - 還有一種就是通過 rpm 來安裝,配置好源之后,
yum install cuda-drivers-410.79-1
(注意自己修改版本),這種方式默認在/usr/bin
、/usr/lib64
下
我動手把 Nvidia driver 主要的 rpm 包都解包了下:
庫文件:
nvidia-driver-410.79-1.el7.x86_64.rpm 29MB 核心驅動
./usr/lib64/nvidia/xorg/libglxserver_nvidia.so 15M
./usr/lib64/xorg/modules/drivers/nvidia_drv.so 7.5M
nvidia-driver-libs-410.79-1.el7.x86_64.rpm 44MB 核心庫文件
./etc/ld.so.conf.d/nvidia-x86_64.conf
./usr/lib64/libEGL_nvidia.so.410.79 1008K
./usr/lib64/libGLESv1_CM_nvidia.so.410.79 59K
./usr/lib64/libGLESv2_nvidia.so.410.79 109K
./usr/lib64/libGLX_nvidia.so.410.79 1.3M
./usr/lib64/libnvidia-cbl.so.410.79 363K
./usr/lib64/libnvidia-cfg.so.410.79 176K
./usr/lib64/libnvidia-eglcore.so.410.79 25M
./usr/lib64/libnvidia-glcore.so.410.79 26M
./usr/lib64/libnvidia-glsi.so.410.79 568K
./usr/lib64/libnvidia-glvkspirv.so.410.79 14M
./usr/lib64/libnvidia-rtcore.so.410.79 26M
./usr/lib64/libnvidia-tls.so.410.79 15K
./usr/lib64/libnvoptix.so.410.79 34M
./usr/lib64/vdpau/libvdpau_nvidia.so.410.79 965K
./usr/share/glvnd/egl_vendor.d/10_nvidia.json
nvidia-driver-NVML-410.79-1.el7.x86_64.rpm 560K Nvidia Management Library
./usr/lib64/libnvidia-ml.so.410.79 1.5M
nvidia-driver-cuda-libs-410.79-1.el7.x86_64.rpm 33M Nvidia CUDA API Driver?
./usr/lib64/libcuda.so.410.79 15M
./usr/lib64/libnvcuvid.so.410.79 2.7M
./usr/lib64/libnvidia-compiler.so.410.79 46M
./usr/lib64/libnvidia-encode.so.410.79 165K
./usr/lib64/libnvidia-fatbinaryloader.so.410.79 286K
./usr/lib64/libnvidia-opencl.so.410.79 28M
./usr/lib64/libnvidia-ptxjitcompiler.so.410.79 12M
可執行:
nvidia-driver-cuda-410.79-1.el7.x86_64.rpm 394K MPS 和 Nvidia-smi,常用命令
./usr/bin/nvidia-cuda-mps-control
./usr/bin/nvidia-cuda-mps-server
./usr/bin/nvidia-debugdump
./usr/bin/nvidia-smi
nvidia-modprobe-410.79-1.el7.x86_64.rpm 71K 不詳
./usr/bin/nvidia-modprobe
不常用:
nvidia-libXNVCtrl-devel-410.79-1.el7.x86_64 62K 不詳
./usr/include/NVCtrl
./usr/include/NVCtrl/NVCtrl.h
./usr/include/NVCtrl/NVCtrlLib.h
./usr/include/NVCtrl/nv_control.h
./usr/lib64/libXNVCtrl.so
dkms-nvidia-410.79-1.el7.x86_64.rpm 12M 不詳
Registering the NVIDIA Kernel Module with DKMS 不太懂
nvidia-driver-NvFBCOpenGL-410.79-1.el7.x86_64.rpm 135K 不詳
./usr/lib64/libnvidia-fbc.so.1
./usr/lib64/libnvidia-fbc.so.410.79
./usr/lib64/libnvidia-ifr.so.1
./usr/lib64/libnvidia-ifr.so.410.79
CUDA Toolkit
安裝方法:
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run
執行完了之后,應該會在 /usr/local/cuda-9.0/
(版本注意修改)
/usr/local/cuda-9.0/lib64/ 中包含了所有的 CUDA 庫文件,從上層到底層分別是:
libcublas.so
libcufft.so
屬於 CUDA librarylibcudart.so
屬於 CUDA runtimelibcuda.so
屬於 CUDA driver API (nv driver 范疇)- nvidia driver (user mode)(nv driver 范疇)
- nvidia driver (kernel mode)(nv driver 范疇)
注意,/usr/local/cuda-9.0/lib64/stubs
文件夾下有很多 libcuda.so
等文件,這個和 Nvidia driver
提供的 libcuda.so
名字一模一樣,但是實際上 stubs
下的庫是不正確的,目前也不知道他有什么用。