docker使用問題總結
解決國內不能訪問gcr.io的問題
國內可以通過https://dashboard.daocloud.io來下載。
比如?gcr.io/google_containers/pause, 可以
dao pull google/pause,
然后
docker tag google/pause ?gcr.io/google_containers/pause?
docker tag google/pause gcr.io/google_containers/pause:0.8.0?
重啟docker服務器后 遇到 'device or resource busy'錯誤
解決方式是先找出沒有umount的路徑
cat /proc/mounts | grep "mapper/docker" | awk '{print $2}'
然后依次unmount
# systemctl stop docker.service
# thin_check /var/lib/docker/devicemapper/devicemapper/metadata
If there were no errors then proceed with:
# thin_check --clear-needs-check-flag /var/lib/docker/devicemapper/devicemapper/metadata
# systemctl start docker.service
If there were errors, you are on your own, but 'man thin_check' and 'man thin_repair' may be helpful...
========================================================
2. docker默認添加的iptables(ip相關的自己定制):
docker nat表部分:
docker0IP=`ifconfig docker0 |grep 'inet' | cut -d ' ' -f 10`
iptables -A POSTROUTING -t nat -s $docker0IP/30 ! -o docker0 -j MASQUERADE
DockerChain="DOCKER"
iptables -t nat -nL $DockerChain
if [ "x$?" != "x0" ] ; then
iptables -t nat -N $DockerChain
fi
iptables -A PREROUTING -m addrtype --dst-type LOCAL -t nat -j $DockerChain
iptables -A OUTPUT -m addrtype --dst-type LOCAL -t nat -j $DockerChain ! --dst 127.0.0.0/8
參考代碼:
https://github.com/docker/docker/blob/2ad81da856c123acf91eeff7ab607376bd27d9ba/vendor/src/github.com/docker/libnetwork/drivers/bridge/setup_ip_tables.go
https://github.com/docker/docker/blob/2ad81da856c123acf91eeff7ab607376bd27d9ba/vendor/src/github.com/docker/libnetwork/iptables/iptables.go
=========================================================
3.docker報類似如下錯誤【chown socket at step GROUP: No such process】,導致啟動失敗:
# journalctl -xn
-- Logs begin at Tue 2014-12-30 13:07:53 EST, end at Tue 2014-12-30 13:25:23 EST. --
Dec 30 13:12:30 ITX kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Dec 30 13:22:53 ITX systemd[1]: Starting Cleanup of Temporary Directories...
-- Subject: Unit systemd-tmpfiles-clean.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-tmpfiles-clean.service has begun starting up.
Dec 30 13:22:53 ITX systemd[1]: Started Cleanup of Temporary Directories.
-- Subject: Unit systemd-tmpfiles-clean.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-tmpfiles-clean.service has finished starting up.
--
-- The start-up result is done.
Dec 30 13:25:23 ITX systemd[1]: Starting Docker Socket for the API.
-- Subject: Unit docker.socket has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.socket has begun starting up.
Dec 30 13:25:23 ITX systemd[1868]: Failed to chown socket at step GROUP: No such process
Dec 30 13:25:23 ITX systemd[1]: docker.socket control process exited, code=exited status=216
Dec 30 13:25:23 ITX systemd[1]: Failed to listen on Docker Socket for the API.
-- Subject: Unit docker.socket has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.socket has failed.
--
-- The result is failed.
Dec 30 13:25:23 ITX systemd[1]: Dependency failed for Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has failed.
--
-- The result is dependency.
Dec 30 13:25:23 ITX systemd[1]: Unit docker.socket entered failed state.
解決辦法:
方法1.添加docker用戶組(groupadd docker,如果/etc/group用統一配置管理的話記得在源group文件中添加docker組信息)
方法2.修改/usr/lib/systemd/system/docker.socket文件:
[Unit]
Description=Docker Socket for the API
PartOf=docker.service
[Socket]
ListenStream=/var/run/docker.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker 這里改成:SocketGroup=root 或其他存在的組
[Install]
WantedBy=sockets.target
如下操作可選:
systemctl enable docker.service && systemctl enable docker.socket:
# systemctl list-unit-files | grep docker
docker.service disabled
docker.socket disabled
# chkconfig docker on #如果chkconfig不能使用則執行:systemctl enable docker.service
Note: Forwarding request to 'systemctl enable docker.service'.
ln -s '/usr/lib/systemd/system/docker.service' '/etc/systemd/system/multi-user.target.wants/docker.service'
# systemctl list-unit-files|grep docker
docker.service enabled
docker.socket disabled
# systemctl enable docker.socket
ln -s '/usr/lib/systemd/system/docker.socket' '/etc/systemd/system/sockets.target.wants/docker.socket'
# systemctl list-unit-files|grep docker
docker.service enabled
docker.socket enabled
參考鏈接:
http://www.milliondollarserver.com/?cat=7
http://www.milliondollarserver.com/?p=622
===============================================================
4.當宿主機上只有一個容器時,刪除容器有時會導致宿主機網路瞬斷
解決方法:
1.修改/etc/sysconfig/ntpd配置文件增加"-L"選項,如
cat /etc/sysconfig/ntpd
# Command line options for ntpd
OPTIONS="-g -L"
2.重啟ntpd服務:systemctl restart ntpd
參考鏈接:
https://access.redhat.com/solutions/261123
========================================================
5.docker1.6+按照官方文檔搭建的私有registry, 但是docker login的時候報錯
Username: ever
Password:
Email:
Error response from daemon: Unexpected status code [404] : <html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.6.3</center>
</body>
</html>
解決方法:大概說就是docker 1.6+ 需要registry 2.0, 此外還需要nginx的一個配置,而且這個配置官方文檔錯的,本來應該用set_more_header,文檔用的add_header
官方v1 image和 v2 image遷移工具,可以看一下 https://github.com/docker/migrator,推薦書籍浙大的《docker 容器和容器雲》
========================================================
6.docker1.8 pull鏡像服務端的訪問日志:
127.0.0.1 - - [16/Oct/2015:10:08:52 +0000] "GET /v2/ HTTP/1.1" 401 194 "-" "docker/1.8.3 go/go1.4.2 git-commit/f4bf5c7 kernel/4.2.0-1.el7.elrepo.x86_64 os/linux arch/amd64" "-"
127.0.0.1 - - [16/Oct/2015:10:08:52 +0000] "GET /v1/_ping HTTP/1.1" 404 168 "-" "docker/1.8.3 go/go1.4.2 git-commit/f4bf5c7 kernel/4.2.0-1.el7.elrepo.x86_64 os/linux arch/amd64" "-"
127.0.0.1 - - [16/Oct/2015:10:08:52 +0000] "POST /v1/users/ HTTP/1.1" 404 168 "-" "docker/1.8.3 go/go1.4.2 git-commit/f4bf5c7 kernel/4.2.0-1.el7.elrepo.x86_64 os/linux arch/amd64" "-"
docker應該訪問v2接口卻去訪問v1的接口了
解決方法:docker和registry之間通過一個header來協商api的版本
========================================================
7.docker容器重啟或宿主的iptables服務重啟后容器無法接收到udp數據包(Failed to receive UDP traffic):
原因:重啟容器或重啟宿主的iptables服務,在重啟過程中,因為在某個時間點,對docker服務做的nat會因為重啟失效,物理機會返回端口不可用(如:8888 port unreachable)的錯誤,這條返回會更新ip_conntrack表的緩存為類似這樣:
ipv4 2 udp 17 29 src=xx.xx.xx.xx dst=xx.xx.xx.xx sport=xxxx dport=xxxx [UNREPLIED] src=xx.xx.xx.xx dst=xx.xx.xx.xx sport=xxxx dport=xxxx mark=0 zone=0 use=2
解決方法:清理conntrack緩存(可以使用conntrack-tool: conntrack -F)
相關鏈接:https://github.com/docker/docker/issues/8795 清理conntrack
========================================================
8.docker宿主機新增分區(/ssd),docker必須重啟,起容器時在該分區的數據卷(-v /ssd:/ssd)才能生效
解決方法(慎用):修改/usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target docker.socket
Requires=docker.socket
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
ExecStart=/usr/bin/docker -d $OPTIONS $DOCKER_STORAGE_OPTIONS
ExecStartPost=/usr/bin/chmod 777 /var/run/docker.sock
LimitNOFILE=1048576
LimitNPROC=1048576
MountFlags=private #將這里修改成 MountFlags=shared
[Install]
WantedBy=multi-user.target
相關鏈接:https://huaminchen.wordpress.com/2015/05/19/how-docker-handles-mount-namespace/
========================================================
9.MFS+DOCKER的文件掛載問題
mfs在本地掛載如下
mfsmount /mnt -H ip -P port -S /
這樣本地就有一個/mnt的mfs目錄了
但是使用docker run -it -v /mnt:/mnt image:tags /bin/bash
之后發現容器內部還是本地的目錄,並不是mfs的掛載目錄。大小也不對。查看系統日志發現一個警告:
Jul 16 11:52:36 TENCENT64 docker: [error] mount.go:12 [warning]: couldn’t run auplink before unmount: exec: “auplink”: executable file not found
in $PATH
本地找不到這個auplink的命令,導致docker掛載異常,centos安裝如下:
yum install aufs-util
然后需要重啟docker
systemctl restart docker
重啟容器就可以了
到現在為止docker掛載mfs總共莫名其妙的出過兩次問題:
1.mfs修改了掛載目錄,但是沒有重啟docker,結果不論如何啟動,抓取日志,依舊沒有辦法在docker容器中看到mfs的掛載目錄。
2.在啟動進入容器之后,刪除了大量的文件,操作過程已經結束,但是mfs有回收站機制,文件沒放到了回收站,真正的數據清理其實並沒有進行。這個狀態你可以在mfs.cgi頁面可以看到。結果在容器中mkdir創建文件夾的時候報device is busy.
這兩個錯誤,我都是重啟docker之后才解決的。我認為可能是docker底層的文件服務,cgroup或者aufs有點問題。這個問題暫且留着。
========================================================
10.docker v1版私有倉庫,鏡像第一次上傳時索引寫入db,但是鏡像上傳失敗(search可以找到,但是delete接口刪除失敗),倉庫報錯如下:
原因:索引已經寫入db,但是鏡像上傳失敗,此時會再次寫入索引,進而引起name不唯一的報錯
解決方法:索引存在sqlite數據庫中,去數據庫中把報錯的鏡像索引刪掉即可(sqlite3 docker-registry.db;.tables;select * from repository;)。
========================================================
11.device mapper discard的宕機。
原因:這個問題反復出現在某些服務器上,宕機重啟后通過IPMI consule進入時系統已經重新掛載了CoreDump的Kernel,看到CoreDump生成dump之前進行Recover操作和Data Copying操作,導致恢復時間很慢。通過Coredump分析屬於Kernel在DM discard方面的一個BUG,方法為禁用docker devicemapper的discard。
解決方法:設置docker啟動參數"--storage-opt dm.mountopt=nodiscard --storage-opt dm.blkdiscard=false"
========================================================
12.docker啟動報錯[error] attach_loopback.go:42 There are no more loopback devices available,完整錯誤日志:
systemd[1]: Starting Docker Application Container Engine...
docker[47518]: 2016/02/03 14:50:32 docker daemon: 1.3.2 39fa2fa/1.3.2; execdriver: native; graphdriver:
docker[47518]: [b98612a1] +job serveapi(fd://, tcp://0.0.0.0:2375, unix:///var/run/docker.sock)
docker[47518]: [error] attach_loopback.go:42 There are no more loopback devices available.
docker[47518]: 2016/02/03 14:50:32 loopback mounting failed
systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start Docker Application Container Engine.
systemd[1]: Unit docker.service entered failed state.
systemd[1]: docker.service failed.
原因:because your host system does not have the loopback files in it's dev for docker to use.
解決方法:Use something like this on your host then run the container and it will pick up the devices.
#!/bin/bash
for i in {0..6}
do
mknod -m0660 /dev/loop$i b 7 $i
done
docker 官方issue:git issue
=========================其他鏈接================================