etcd頻繁選舉leader
集群中etcd出現報警
Alert Name: A high number of leader changes within the etcd cluster are happening Severity: warning Cluster Name: shdmz-prod-diamond (ID: c-n6wc4) Namespace: cattle-prometheus Expression: increase(etcd_server_leader_changes_seen_total[1h])>3 Description: Threshold Crossed: datapoint value 4.067796610169491 was greater than to the threshold (3) for (3m)
日志中發現的問題,還有類似心跳檢測超時的情況
2020-07-08 11:32:11.730958 W | rafthttp: the clock difference against peer db40725e6f94d8e3 is too high [13.717094955s > 1s] (prober "ROUND_TRIPPER_RAFT_MESSAGE")
解決方式
1、集群中有某些機器時間不同步
2、擴大心跳檢測時長
- --election-timeout=5000 - --heartbeat-interval=500
