1. Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time
解決問題: https://github.com/opencontainers/runc/issues/1740
most of the memories are consumed by page cache, echo 1 > /proc/sys/vm/drop_caches
2. Rpmdb checksum is invalid: dCDPT(pkg checksums):
描述:
rpm數據庫損壞需要重建。需要在 yum install … 前使用 rpm –rebuilddb重建數據庫
解決方法:
RUN rpm --rebuilddb && yum install -y ...
3. Docker宿主機agetty進程cpu占用率100%
描述:
使用"docker run"運行容器時使用了 /sbin/init和--privileged參數
解決方法:
在宿主機以及Container中運行下述命令
systemctl stop getty@tty1.service
systemctl mask getty@tty1.service
4. Failed to get D-Bus connection: Operation not permitted
描述:
centos 7.2容器內使用systemctl命令
解決方法:
docker run --privileged -d centos:7.2.1511 /usr/sbin/init
5. 解決ssh登錄慢慢的問題
使用了dns反查,當ssh某個IP時,通過DNS反查相對應的域名,如果DNS中沒有這個IP的域名解析,等待超時
解決方法:/etc/ssh/sshd_config
設置 UseDNS no
6. /etc/hosts, /etc/resolv.conf和/etc/hostname都是易失
問題描述:
/etc/hosts, /etc/resolv.conf和/etc/hostname,容器中的這三個文件不存在於鏡像,而是存在於/var/lib/docker/containers/<container_id>,在啟動容器的時候,通過mount的形式將這些文件掛載到容器內部。因此,如果在容器中修改這些文件的話,修改部分不會存在於容器的top layer,而是直接寫入這三個物理文件中。
解決方法:
通過docker run命令的--add-host參數來為容器添加host與ip的映射關系
通過echo -e "aaa.com 10.10.10.10\n" >> /etc/hosts
7. docker容器centos 7.2鏡像支持中文
sudo localedef -c -f UTF-8 -i zh_CN zh_CN.utf8
export LC_ALL="zh_CN.UTF-8"
8. docker容器時間為UTC時間,與宿主機相差8小時
cp -a /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
9. overlayfs: Can't delete file moved from base layer to newly created dir even on ext4
Centos 提供的新文件系統 XFS 和 Overlay 兼容問題導致, 這個問題的修復在內核 4.4.6以上( https://github.com/moby/moby/issues/9572)
Fixed in linux 4.5 going to be backported into next 4.4.y and other stable brances. Simple test sequence in commit message (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?)id=45d11738969633ec07ca35d75d486bf2d8918df6
解決方法:
1. 停止各個中間服務,
stop容器(docker stop $(docker ps -qa))
systemctl stop docker
備份數據/srv lsof | grep srv
2. 查看磁盤分區fdisk -l , mount | grep srv
umount /dev/mapper/centos-srv
格式化: mkfs.xfs -fn ftype=1 /dev/mapper/centos-srv
查看ftype是否設置為1: xfs-info /srv |grep ftype
mount /dev/mapper/centos-srv /srv/
3. 恢復數據/srv
systemctl start docker
docker start $(docker ps -qa)
10. /var/lib/docker/overlay2 占用很大,占用幾百G空間
描述:這個問題應該是容器內應用產生的數據或者日志造成
解決方法:
進入/var/lib/docker/overlay2,du -h --max-depth=1查看哪個容器占用的比較大,我擦一看占用450G,
一查看發現日志占用的多,這個啥嗎應用這么刷日志,是調試遺留的一個容器,一直在刷錯誤日志,docker kill and docker rm,一下釋放了450多G
11. Error starting daemon: error initializing graphdriver: driver not supported
描述: 使用overlay2存儲驅動啟動docker daemon報錯
解決方法: 添加配置如下:
cat /etc/docker/daemon.json
{
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
或者添加啟動參數:
/usr/bin/dockerd --storage-driver=overlay2 --storage-opt overlay2.override_kernel_check=1
11. 修改docker容器最大文件數(open files)
直接修改docker container的 /etc/security/limits.conf無效
宿主機上執行如下操作:
[lin@node1 ~]$ cat /etc/sysconfig/docker
ulimit -HSn 999999
重起docker daemon進程,systemctl restart docker
12. /var/lib/docker/containers 占用過大
描述: 容器日志一般存放在/var/lib/docker/containers/container_id/下面, 以json.log結尾的文件(業務日志)很大。
如果docker容器正在運行,那么使用rm -rf方式刪除日志后,通過df -h會發現磁盤空間並沒有釋放。原因是在Linux或者Unix系統中,通過rm -rf或者文件管理器刪除文件,將會從文件系統的目錄結構上解除鏈接(unlink)。如果文件是被打開的(有一個進程正在使用),那么進程將仍然可以讀取該文件,磁盤空間也一直被占用
解決方法1:cat /dev/null > *-json.log
#!/bin/sh
logs=$(find /var/lib/docker/containers/ -name *-json.log)
for log in $logs
do
echo "clean logs : $log"
cat /dev/null > $log
done
解決方法2:增加dockerd啟動參數,/etc/docker/daemon.json
{
"registry-mirrors": ["https://registry.docker-cn.com"],
"max-concurrent-downloads": 6,
"insecure-registries":["harbor.master.online.local", "harbor.local.com"],
"log-driver":"json-file",
"log-opts": {"max-size":"2G", "max-file":"10"}
}
13. umount.nfs device is busy
描述:
[root@localhost /]# umount /data/share/gaocheng
umount.nfs: /data/share/gaocheng: device is busy
解決方法:
-v 表示 verbose 模式。進程以 ps 的方式顯示,包括 PID、USER、COMMAND、ACCESS 字段
-m 表示指定文件所在的文件系統或者塊設備(處於 mount 狀態)。所有訪問該文件系統的進程都被列出。
如上所示,有進程占用了,將其kill掉,再重新取消掛載
kill -9 25869 45636 131466
umount /data/share/gaocheng
14. docker hang
調試: curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
only found in docker/container/state.go發現想要獲取lock,真特么得不到,導致所有都等待,暫未發現哪里給lock了
func (s *State) IsRunning() bool {
s.Lock()
res := s.Running
s.Unlock()
return res
}
goroutine 68266990 [semacquire, 1350 minutes]:
sync.runtime_SemacquireMutex(0xc421004e04, 0x11f0900)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc421004e00)
/usr/local/go/src/sync/mutex.go:134 +0xee
github.com/docker/docker/container.(*State).IsRunning(0xc421004e00, 0xc42995c2b7)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.go:250 +0x2d
github.com/docker/docker/daemon.(*Daemon).ContainerStop(0xc420260000, 0xc42995c2b7, 0x40, 0xc421c3cac0, 0xc428235298, 0xc42e5fac01)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stop.go:23 +0x84
github.com/docker/docker/api/server/router/container.(*containerRouter).postContainersStop(0xc420b4bfc0, 0x7f85ac36f560, 0xc42844d3e0, 0x38b7d00, 0xc42b8aaa80, 0xc42d85be00, 0xc42844d290, 0x2a4e3dd, 0x5)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.go:186 +0xf0
github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainersStop)-fm(0x7f85ac36f560, 0xc42844d3e0, 0x38b7d00, 0xc42b8aaa80, 0xc42d85be00, 0xc42844d290, 0x7f85ac36f560, 0xc42844d3e0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:67 +0x69
github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1(0x7f85ac36f560, 0xc42844d3e0, 0x38b7d00, 0xc42b8aaa80, 0xc42d85be00, 0xc42844d290, 0x7f85ac36f560, 0xc42844d3e0)
15 Too many open files的四種解決辦法
一 單個進程打開文件句柄數過多
通過“cat /proc/<pid>/limits”查看
二 操作系統打開的文件句柄數過多
/proc/sys/fs/file-max”命令來動態修改該值,也可以通過修改"/etc/sysctl.conf"文件來永久修改該值
三 systemd對該進程進行了限制
LimitNOFILE=
四 inotify達到上限
監控機制,可以監控文件系統的變化。該機制受到2個內核參數的影響:“fs.inotify.max_user_instances”和“fs.inotify.max_user_watches”,其中“fs.inotify.max_user_instances”表示每個用戶最多可以創建的inotify instances數量上限,“fs.inotify.max_user_watches”表示么個用戶同時可以添加的watch數目
參考:Too many open files的四種解決辦法
https://bbs.huaweicloud.com/blogs/7c4c0324a5da11e89fc57ca23e93a89f
參考:Linux下設置最大文件打開數nofile及nr_open、file-max
https://www.cnblogs.com/zengkefu/p/5635153.html
16 docker harbor上鏡像硬刪除
docker-compose stop
docker run -it --name gc --rm --volumes-from registry vmware/registry:2.6.2-photon garbage-collect --dry-run /etc/registry/config.yml
docker run -it --name gc --rm --volumes-from registry vmware/registry:2.6.2-photon garbage-collect /etc/registry/config.yml
docker-compose start
17 failed to disable IPv6 forwarding for container's interface all' in Docker daemon logs
Issue
If IPv6 networking has been disabled in the kernel, log entries of the following format may be seen in the Docker daemon logs:
Aug 21 11:05:39 ucpworker-0 dockerd[16657]: time="2018-08-21T11:05:39Z" level=error msg="failed to disable IPv6 forwarding for container's interface all: open /proc/sys/net/ipv6/conf/all/disable_ipv6: no such file or directory"
Aug 21 11:05:39 ucpworker-0 dockerd[16657]: time="2018-08-21T11:05:39.599992614Z" level=warning msg="Failed to disable IPv6 on all interfaces on network namespace \"/var/run/docker/netns/de1d0ed4fae9\": reexec to set IPv6 failed: exit status 4"
Root Cause
These log entries are displayed when the Docker daemon fails to disable IPv6 forwarding on a container's interfaces, because IPv6 networking has been disabled at the kernel level (i.e. the ipv6.disable=1 parameter has been added to your kernel boot configuration).
In this situation, the log entries are expected and can be ignored.
Resolution
In a future release of the Docker engine the log level will be updated, so that these entries are only warning messages, as this does not indicate an actionable error case.
參考:https://success.docker.com/article/failed-to-disable-ipv6-forwarding-for-containers-interface-all-error-in-docker-daemon-logs
18. iptables -I FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
在宿主機上發往docker0網橋的網絡數據報文,如果是該數據報文所處的連接已經建立的話,則無條件接受,並由Linux內核將其發送到原來的連接上,即回到Docker Container內部
19. malformed module missing dot in first path element
go1.13 mod 要求import 后面的path 第一個元素,符合域名規范,比如code.be.mingbai.com/tools/soa
go mod 要求所有依賴的 import path 的path 以域名開頭,如果現有項目轉1.13的go mod 模式,且不是以域名開頭則需要修改。
go mod init csi.storage.com/storage-csi