gpu使用准備
在基於docker-compose使用GPU之前,你的docker必須要能夠使用--gpus
參數指定設備基於run
命令啟動!
如果你遇到docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
可以自行跳轉解決!
docker-compose.yaml文件編寫
docker-compose.yaml文件我們注意有version
、services
、networks
三個關鍵字,version
用於指定代碼編寫使用的版本規則;services
用於配置服務;networks
用於配置網絡。
下面我列出一個測試文件:
version: "3.8"
services:
pdf:
image: "xxxx:xxxxx"
user: "root"
restart: "on-failure"
expose:
- "22"
- "51002-51003"
ports:
- "51001:22"
- "51002-51003:51002-51003"
shm_size: "4g"
networks:
- "ana"
container_name: "literature_pdf"
tty: "true"
fig:
image: "xxxxx:xxxxx"
user: "root"
restart: "on-failure"
expose:
- "22"
- "51009-51020"
ports:
- "51008:22"
- "51009-51020:51009-51020"
shm_size: "8g"
volumes:
- "/data/elfin/utils/detectron2-master:/home/appuser/detectron2-master"
environment:
- "NVIDIA_VISIBLE_DEVICES=all"
deploy:
resources:
reservations:
devices:
- driver: "nvidia"
count: "all"
capabilities: ["gpu"]
networks:
- "ana"
container_name: "fig"
tty: "true"
ocr:
image: "xxxxx:xxxxx"
user: "root"
restart: "on-failure"
expose:
- "22"
- "51005-51007"
ports:
- "51004:22"
- "51005-51007:51005-51007"
shm_size: "6g"
deploy:
resources:
reservations:
devices:
- device_ids: ["1"]
capabilities: ["gpu"]
driver: "nvidia"
networks:
- "ana"
container_name: "ocr"
tty: "true"
entrypoint: ["supervisord", "-n", "-c", "/etc/supervisor/supervisord.conf"]
networks:
ana:
driver: bridge
注:上面的代碼只是測試,很多地方需要優化,不是一個非常好的范本!其中,image用於指定鏡像。
注意上面實現了容器掛載、gpus使用、自定義網絡、端口映射。我感覺GPU的配置是最難的,很多時候老是會犯一些小錯誤,導致啟動后應用無法開啟。下面是關於容器的GPU依賴配置:
deploy:
resources:
reservations:
devices:
- driver: "nvidia"
count: "all"
capabilities: ["gpu"]
這里的capabilities是必須要指定的,而且count、driver、capabilities這是一組,不能每個加"-",不然會報錯。關於GPU的其他配置可以參考官方文檔 https://docs.docker.com/compose/gpu-support/ 。
追加:下面是不錯的博客,可以參考: