問題現象是telnet zk服務器地址不通,如下:
telnet 10.18.0.31 2181
Trying 10.18.0.31...
Connected to 10.18.0.31.
Escape character is '^]'.
Connection closed by foreign host.
從其他地址telnet zk服務器可通,初步判斷是超過了zk服務器連接數導致
1. 查看zk服務器連接數配置
[root@hdfs-10-18-0-31 ~]# grep maxClient /data/zookeeper-3.4.14/conf/zoo.cfg
maxClientCnxns=2000
2. 查看服務器2181端口已有連接數
[root@hdfs-10-18-0-31 ~]# netstat -tan | grep 2181 | awk '{print $5}' | grep -E '([0-9]+\.){3}[0-9]+' -o | sort | uniq -c
2000 10.18.0.27
2002 10.18.0.29
3. 查看k8s node上,是哪個pod建立的連接
[root@tbds-10-18-0-27 ~]# cat /proc/net/nf_conntrack | grep 2181 | awk '{print $7}'|sort|uniq -c
1996 src=192.168.237.213
4. 獲取pod名稱
kubectl -n xxx get pod -o wide | grep 192.168.237.213
xxx-pod-name 1/1 Running 0 15h 192.168.237.213 tbds-10-18-0-27
至此,終於找到了是哪個pod建立了這么鏈接