問題描述:
1. K8S集群有一個worker,經常磁盤滿,然后導致服務異常。
2. 查看/var/log/syslog, 發現非常多的異常如下:
1568405.455565] docker0: port 2(vethfd09262) entered forwarding state [1568490.807194] aufs au_opts_verify:1612:docker[22618]: dirperm1 breaks the protection by the permission bits on the lower branch [1568490.839695] aufs au_opts_verify:1612:docker[25041]: dirperm1 breaks the protection by the permission bits on the lower branch
3. 從/var/log/kern.log中查到以下異常:
SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
Mar 31 18:52:08 AQA-Worker-CLD kernel: [292333.759874] cache: nf_conntrack_12(1847:58cc5f8478f68d01290885da9a59e974cf0d4575d5b92047bea0c7fd5f82130f), object size: 312, buffer size: 320, default order: 1, min order: 0
原因:
AUFS不穩定,導致docker刪除instance的時候不能正常刪除,從docker ps上看container已經刪除掉了,但系統資源並沒有釋放,導致磁盤使用持續上升。
參考:https://codeday.me/bug/20181115/395036.html
docker info
Containers: 0
Images: 0
Storage Driver: aufs
Backing Filesystem: xfs
Supports d_type: true Native Overlay Diff: true <output truncated>
解決方法:
1. sudo systemctl stop docker
2. mv /var/lib/docker
/var/lib/docker.bk
3. vim /etc/docker/daemon.json
{ "storage-driver": "overlay2" }
4. systemctl restart docker
5. docker info :
Containers: 0
Images: 0
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true Native Overlay Diff: true <output truncated>
參考:https://docs.docker.com/storage/storagedriver/overlayfs-driver/