故障现象:
报错日志内容:
[2021-04-13 15:39:39,332] ERROR Error while creating ephemeral at /brokers/ids/3, node already exists and owner '0' does not match current session '146785369381863503' (kafka.zk.KafkaZkClient$CheckedEphemeral)
[2021-04-13 15:39:39,341] ERROR [KafkaServer id=3] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
返回重启kafka不行,没办法删除zk数据节点。
(处理过程汇总:通过kafka describe命令,集群正常,但实际上各节点应用都已经停止了,登录zk,也看不到/kafka/nextsfpaycore/controller文件,故此剔除zk里面的kfk节点,重新建立关系)
[appdeploy@cnsz22VLK10176:/home/appdeploy]$/app/zookeeper-3.4.6/bin/zkCli.sh -server 10.xx.16.70:2181
[zk: 10.xx.16.70:2181(CONNECTED) 0] ls /kafka/nextsfpaycore/controller
[]
[zk: 10.xx.16.70:2181(CONNECTED) 1] get /kafka/nextsfpaycore/controller
{"version":1,"brokerid":1,"timestamp":"1618302213857"}
cZxid = 0x500001648
ctime = Tue Apr 13 16:23:33 CST 2021
mZxid = 0x500001648
mtime = Tue Apr 13 16:23:33 CST 2021
pZxid = 0x500001648
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x1097c84abbb0056
dataLength = 54
numChildren = 0
[zk: 10.xxx.16.70:2181(CONNECTED) 2] ls /kafka/nextsfpaycore/brokers/ids
[1, 2, 3, 4, 5]
[zk: 10.208.16.70:2181(CONNECTED) 3] get /kafka/nextsfpaycore/brokers/ids
[1,2,3,4]
cZxid = 0x3017f09be
ctime = Mon Apr 12 19:37:35 CST 2021
mZxid = 0x3017f09be
mtime = Mon Apr 12 19:37:35 CST 2021
pZxid = 0x50000169e
cversion = 19
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 9
numChildren = 5
[zk: 10.xxx.16.70:2181(CONNECTED) 4] get /kafka/nextsfpaycore/brokers/ids/1
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT","SASL_PLAINTEXT":"SASL_PLAINTEXT"},"endpoints":["PLAINTEXT://10.xx.16.66:9092","SASL_PLAINTEXT://10.xx.16.66:9093"],"jmx_port":7007,"host":"10.xx.16.66","timestamp":"1618299910700","port":9092,"version":4}
cZxid = 0x500000deb
ctime = Tue Apr 13 15:45:10 CST 2021
mZxid = 0x500000deb
mtime = Tue Apr 13 15:45:10 CST 2021
pZxid = 0x500000deb
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x1097c84abbb0056
dataLength = 267
numChildren = 0
[zk: 10.xx.16.70:2181(CONNECTED) 5] delete /kafka/nextsfpaycore/brokers/ids/2 (剔除改kafka节点,然后重启对应的节点)
如上,依次剔除节点,重启kfk服务,最后使用describe查看集群状态,kfk恢复正常。
[mwopr@CNSZ22PL407 scripts]$ /app/kafka_2.12-2.4.0/bin/kafka-topics.sh --zookeeper 10.xx.16.70:2181/kafka/nextspaycore --describe
如上,故障处理完成。