今天是繼續對之前 kafka 集群遺留問題的查漏補缺。
擴容后對副本進行再平衡:
今天檢查 kafka manager 發現了一個 __consumer_offsets 主題(消費者分區位移保存主題)的 leader 副本只被部署在了已有三節點中的兩個節點上。並沒有將三個 broker 上都平均分布上副本,具體表現為
我們點開這個主題
可以發現原本是三個節點的我們,卻非常不均勻的只有兩個節點承擔了存放該 partition 的任務。
所以我們需要重新非配這個 topic 的副本均勻的到三個節點上去。
根據官方文檔提供的方法,我們可以借助
./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --topics-to-move-json-file ~/consumer_offsets_adjust.json --broker-list "0,1,2" --generate
這里的 consumer_offsets_adjust.json 的格式 官方給了一個格式
> cat topics-to-move.json {"topics": [{"topic": "foo1"}, {"topic": "foo2"}], "version":1 }
這里 topics 就是我們想要重新分配的 topics。
命令獲得重新分配后的結構。里面的具體參數都很常見在這里不再細說,執行這個命令之后我們會獲得一份 再分區前的數據情況,和再分區之后的情況。
我這里貼一份提供格式化之后的 proposed partition reaasigment 我們看下
{ "version": 1, "partitions": [{ "topic": "__consumer_offsets", "partition": 19, "replicas": [2], "log_dirs": ["any"] }, { "topic": "__consumer_offsets", "partition": 10, "replicas": [2], "log_dirs": ["any"] }, { "topic": "__consumer_offsets", "partition": 14, "replicas": [0], "log_dirs": ["any"] }
//中間省略若干 partitions
}
可以看到 replicas 現在有 0 1 2 三個 broker 了。之后我們選擇執行該方案,將該方案拷貝進 ~/target.json 文件中再運行:
./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/target.json --execute
就可以開始執行再分配的任務了 另外 kafka 會返回給我們消息
Current partition replica assignment {"version":1,"partitions":[{"topic":"__consumer_offsets","partition":19,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":30,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":47,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":29,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":41,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":39,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":10,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":17,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":14,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":40,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":18,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":26,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":0,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":24,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":33,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":20,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":21,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":3,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":5,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":22,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":12,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":8,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":23,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":15,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":48,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":11,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":13,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":49,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":6,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":28,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":4,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":37,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":31,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":44,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":42,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":34,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":46,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":25,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":45,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":27,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":32,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":43,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":36,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":35,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":7,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":9,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":38,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":1,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":16,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":2,"replicas":[1],"log_dirs":["any"]}]} Save this to use as the --reassignment-json-file option during rollback Successfully started reassignment of partitions.
提供給了我們 rollback 的方案,另外這個時候其實就已經開始 ressign 了。ressign 肯定會通知 controller 負責來做這些改變。
我們可以通過命令
./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/target.json --verify
來檢查 ressign 進度
Reassignment of partition __consumer_offsets-38 completed successfully Reassignment of partition __consumer_offsets-13 completed successfully Reassignment of partition __consumer_offsets-8 completed successfully Reassignment of partition __consumer_offsets-5 completed successfully Reassignment of partition __consumer_offsets-39 completed successfully Reassignment of partition __consumer_offsets-36 completed successfully Reassignment of partition __consumer_offsets-40 completed successfully Reassignment of partition __consumer_offsets-45 completed successfully
到此位置我們可以來檢查一下擴容是否已經成功,leader 副本是否已經被成功的被平均分配到了各個節點上。
首先 kafka brokers spread
leader
可以看到我們完成了! IT WORKED!這里多提一點是,如果我們需要不讓某個 broker 來存儲該 topic 的副本的時候,我們同樣可以使用 ressign 機制,並且在 broker list 那里去掉對應節點即可。
為已有分區增加 replica:
通過上面的截圖大家可能也注意到一個問題,由於歷史遺留問題,我們最最最重要的 offsets 提交分區只有一個 replica 也就是只有 leader 副本。通常情況下我們會設置三副本機制使得 broker 掛掉可以輕松切換,所以我們需要為其添加新的 replica vi ~/inc.json 。
{"version":1, "partitions":[{"topic":"__consumer_offsets","partition":0,"replicas":[0,1,2]}]}
./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/inc.json --execute
我這里只延時了調整一個 partition 的事實上我們需要調整 50個 partitions 。。。。 我就不上腳本了
調整完了之后 Replicas 會變成3副本,但是會帶來一個問題就是 ISR 副本肯定需要慢慢才能跟上,還沒有跟上的 Under Replicated 數也就是落后副本數肯定滿載。變成 true。
但是 ISR 副本數會慢慢跟上來
[2020-01-10 12:03:52,781] INFO [Controller id=0] Updated assigned replicas for partition backend-events-21 being reassigned to 0,1,2 (kafka.controller.KafkaController) [2020-01-10 12:03:52,784] INFO [Controller id=0] Removed partition backend-events-21 from the list of reassigned partitions in zookeeper (kafka.controller.KafkaController) [2020-01-10 12:03:53,124] INFO [Controller id=0] 3/3 replicas have caught up with the leader for partition backend-events-36 being reassigned.Resuming partition reassignment (kafka.controller.KafkaController)
最終會全部趕上來變成下圖的情況。
但是呢 現在的 Preferred Leader 還是異常。這里是為什么呢,因為 Replicas 的第一個副本如果是 Leader 說明這里是 Preferred Leader,我們注意上圖所有是 false 的地方都違反了這個規則並非是 Replicas 的第一個副本是作為 Leader。所以一會兒會觸發 kafka 的 preferred replica 選舉.
020-01-10 12:07:09,735] INFO [Controller id=0] Starting preferred replica leader election for partitions backend-events-43 (kafka.controller.KafkaController) [2020-01-10 12:07:09,735] INFO [PartitionStateMachine controllerId=0] Invoking state change to OnlinePartition for partitions backend-events-43 (kafka.controller.PartitionStateMachine) [2020-01-10 12:07:09,740] INFO [PreferredReplicaPartitionLeaderSelector]: Current leader 2 for partition backend-events-43 is not the preferred replica. Triggering preferred replica leader election (kafka.controller.PreferredReplicaPartitionLeaderSelector) [2020-01-10 12:07:09,753] DEBUG [PartitionStateMachine controllerId=0] After leader election, leader cache for backend-events-43 is updated to (Leader:0,ISR:2,0,1,LeaderEpoch:3,ControllerEpoch:42) (kafka.controller.PartitionStateMachine) [2020-01-10 12:07:09,753] INFO [Controller id=0] Partition backend-events-43 completed preferred replica leader election. New leader is 0 (kafka.controller.KafkaController)
最終會變成下面的情況
可以看到這明顯有問題,這是因為剛才我在設置 replica 的時候無腦寫 0 1 2 ,但是因為順序關系,第一個副本位置會被設置成 Preferred leader 導致我們 leader 嚴重偏移了。
現在我們要重新糾正一下這個問題重新執行「增加集群的步驟」重新平衡一下即可,如果你的集群沒有開啟
auto.leader.rebalance.enable = true
執行完后 可能還需要執行一下
./kafka-preferred-replica-election.sh --zookeeper 10.171.97.1:2181
對 Leader 進行重分配。如果不想這樣我們可以在上面增加副本的時候提前設置好 Preferred Leader 來避免重新設置 Leader 平衡的過程。
最后這個時候再來看下我們集群的狀態
Not Bad!
-----------------------------------------分割線-----------------------------------------
想再分割線下面再補充一點,磁盤的周期性高 io 可能會引起 kafka 響應時間變慢從而導致 controller 認為該節點已經失效,該節點來不及回復來自 zk 的心跳檢查而被踢出集群。
影響了集群的穩定性。
[2020-05-07 23:22:38,125] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition new-play-atom-online-33. (kafka.controller.KafkaController) [2020-05-07 23:22:38,125] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-play-atom-online-33 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController) [2020-05-07 23:22:38,125] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2 for partition __consumer_offsets-5. (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition __consumer_offsets-5 since it is not in the ISR. Leader = 2 ; ISR = List(1, 2) (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 2,1 for partition maxwell_mysql_sync_user-4. (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition maxwell_mysql_sync_user-4 since it is not in the ISR. Leader = 1 ; ISR = List(2, 1) (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 2,0 for partition __consumer_offsets-6. (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition __consumer_offsets-6 since it is not in the ISR. Leader = 2 ; ISR = List(2, 0) (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2,3 for partition test_sync_tidb-4. (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition test_sync_tidb-4 since it is not in the ISR. Leader = 1 ; ISR = List(1, 2) (kafka.controller.KafkaController) [2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition new-atom-online-32. (kafka.controller.KafkaController) [2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-atom-online-32 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController) [2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 2,0 for partition double_write-0. (kafka.controller.KafkaController) [2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition double_write-0 since it is not in the ISR. Leader = 2 ; ISR = List(2, 0) (kafka.controller.KafkaController) [2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition sync_tidb-23. (kafka.controller.KafkaController) [2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition sync_tidb-23 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController) [2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 0,2 for partition new-play-atom-online-2. (kafka.controller.KafkaController) [2020-05-07 23:22:38,128] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-play-atom-online-2 since it is not in the ISR. Leader = 0 ; ISR = List(0, 2) (kafka.controller.KafkaController) [2020-05-07 23:22:38,128] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2,3 for partition sync_to_aliyun-12. (kafka.controller.KafkaController) [2020-05-07 23:22:38,131] INFO [Controller id=0] New leader and ISR for partition sync_to_aliyun-12 is {"leader":2,"leader_epoch":35,"isr":[1,2]} (kafka.controller.KafkaController) [2020-05-07 23:22:38,131] DEBUG [Controller id=0] Removing replica 3 from ISR 0,1 for partition new-atom-online-39. (kafka.controller.KafkaController) [2020-05-07 23:22:38,131] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-atom-online-39 since it is not in the ISR. Leader = 0 ; ISR = List(0, 1) (kafka.controller.KafkaController) [2020-05-07 23:22:38,131] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2 for partition refresh-wjh-15. (kafka.controller.KafkaController) [2020-05-07 23:22:38,131] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition refresh-wjh-15 since it is not in the ISR. Leader = 1 ; ISR = List(1, 2) (kafka.controller.KafkaController) [2020-05-07 23:22:38,147] DEBUG The stop replica request (delete = true) sent to broker 3 is (kafka.controller.ControllerBrokerRequestBatch)
從日志可以看到各 partitions 大規模的移除 replica 3 副本。
Reference:
http://kafka.apache.org/10/documentation.html#basic_ops_cluster_expansion
http://kafka.apache.org/10/documentation.html#basic_ops_increase_replication_factor
https://zhuanlan.zhihu.com/p/38721205 KAFKA集群擴容后的數據遷移