Kafka——broker宕機后無法消費問題


 

背景

因磁盤滿了,導致kafka所有的服務器全部宕機了,然后重啟kafka集群,服務是啟動成功了,但有一些報錯:

broker1:

broker2:

broker3:一直在刷以下錯誤信息

 

雖然報了這些錯,但kafka正常啟動了,通過命令測試了集群能正常生產和消費消息,但是看kafka-manager界面,出現副本未分配的異常情況:

檢查消費這些主題的程序,果然是消費失敗了,一直在刷如下異常信息:

注:圖中IP的是broker3節點

 

截止到這里可以看出,broker3節點出問題了,導致消費者程序連接不上,但奇怪的話,通過命令創建主題測試,在broker3節點又能消費。

繼續分析broker3的日志,報錯原因:集群要求的副本數是2,但只找到1個。

於是查看相關主題的詳細信息,發現確實ISR列表中是少了副本

 

 

猜測由於宕機后,有些節點落后leader太多,還沒有追上來,所以脫離了ISR列表,於是等它自動追上來。

等到第2天一看,還是一樣,沒有追上來,於是決定重啟kafka集群,發現有些分區的會自動擴展成2,出問題的那些分區還是沒有。。。。

然后想通過重新分配分區指定副本,看能否讓它自動恢復一下副本,通過以下命令進行處理:

bin/kafka-reassign-partitions.sh --zookeeper 10.0.xx.x:2181,10.0.xx.x:2181,10.0.xx.x:2181 --reassignment-json-file reassign.json --execute
reassign.json文件內容:
{"version":1, "partitions":[
 {"topic":"__consumer_offsets","partition":0,"replicas":[2,3]}, 
  {"topic":"__consumer_offsets","partition":1,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":2,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":3,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":4,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":5,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":6,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":7,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":8,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":9,"replicas":[2,1]},
  {"topic":"__consumer_offsets","partition":10,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":11,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":12,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":13,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":14,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":15,"replicas":[2,1]},
  {"topic":"__consumer_offsets","partition":16,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":17,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":18,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":19,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":20,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":21,"replicas":[2,1]},
  {"topic":"__consumer_offsets","partition":22,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":23,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":24,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":25,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":26,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":27,"replicas":[2,1]},
  {"topic":"__consumer_offsets","partition":28,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":29,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":30,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":31,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":32,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":33,"replicas":[2,1]},
  {"topic":"__consumer_offsets","partition":34,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":35,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":36,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":37,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":38,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":39,"replicas":[2,1]},  
  {"topic":"__consumer_offsets","partition":40,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":41,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":42,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":43,"replicas":[3,1]},
  {"topic":"__consumer_offsets","partition":44,"replicas":[1,2]},
  {"topic":"__consumer_offsets","partition":45,"replicas":[2,1]},
  {"topic":"__consumer_offsets","partition":46,"replicas":[3,2]},
  {"topic":"__consumer_offsets","partition":47,"replicas":[1,3]},
  {"topic":"__consumer_offsets","partition":48,"replicas":[2,3]},
  {"topic":"__consumer_offsets","partition":49,"replicas":[3,1]}  
]}`

 

 重新分區指定副本的方法也不行,於是修改kafka配置,把集群要求的副本數改為1:

vi server.properties

 

 重啟kafka集群后,broker3不在就報錯了,在重啟消費都程序,也能正常連上kafka進行消費了。

 

 

總結:

kafka出現宕機后,副本脫離ISR列表(落后leader太多),按正常來說它會慢慢追上來后在自動重新加入ISR列表中,但我的等了20個小時后還沒有,重啟kafka集群后也沒有恢復。導致服務啟動有問題。

現在臨時解決方案是調整成1,讓它先跑一段時間后,看能否恢復回來,到時在設置成2。

 

 

問題:

1、原因尚未找到;

2、這樣調整后,kafka會出現數據丟失的情況(出問題期間的數據都丟失了)。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM