现象:
docker swarm集群中节点的上的容器不断的up down
查看日志,如下:
Feb 26 11:43:16 sz-drip dockerd[5065]: time="2020-02-26T11:43:16.124650220+08:00" level=error msg="heartbeat to manager { } failed" error="rpc error: code = NotFound desc = node not register
Feb 26 11:43:16 sz-drip dockerd[5065]: time="2020-02-26T11:43:16.124764380+08:00" level=error msg="agent: session failed" backoff=100ms error="node not registered" module=node/agent node.id=
Feb 26 11:43:16 sz-drip dockerd[5065]: time="2020-02-26T11:43:16.124803737+08:00" level=info msg="manager selected by agent for new session: { }" module=node/agent node.id=a5drrk7eo59x4m6itl
Feb 26 11:43:16 sz-drip dockerd[5065]: time="2020-02-26T11:43:16.124835689+08:00" level=info msg="waiting 47.652971ms before registering session" module=node/agent node.id=a5drrk7eo59x4m6itl
Feb 26 11:43:16 sz-drip dockerd[5065]: time="2020-02-26T11:43:16.278198549+08:00" level=info msg="worker a5drrk7eo59x4m6itlbsf28pq was successfully registered" method="(*Dispatcher).register
原因:
docker swarm 节点间通信默认超时时间为 5s,在虚拟化环境下或者网络不太好的情况下有时候会超时,导致节点重启。
方案:
调高swarm心跳检测时间。
docker swarm update --dispatcher-heartbeat 60s #设置docker swarm的心跳检测时间为1分钟,在manages节点执行