Kafka 進行機器擴容后的副本再平衡 和 為已有分區增加 replica 實踐


今天是繼續對之前 kafka 集群遺留問題的查漏補缺。

 

擴容后對副本進行再平衡:

今天檢查 kafka manager 發現了一個 __consumer_offsets 主題(消費者分區位移保存主題)的 leader 副本只被部署在了已有三節點中的兩個節點上。並沒有將三個 broker 上都平均分布上副本,具體表現為

 

我們點開這個主題

 

可以發現原本是三個節點的我們,卻非常不均勻的只有兩個節點承擔了存放該 partition 的任務。

所以我們需要重新非配這個 topic 的副本均勻的到三個節點上去。

根據官方文檔提供的方法,我們可以借助

./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --topics-to-move-json-file ~/consumer_offsets_adjust.json --broker-list "0,1,2" --generate

這里的 consumer_offsets_adjust.json 的格式 官方給了一個格式

> cat topics-to-move.json
{"topics": [{"topic": "foo1"},
            {"topic": "foo2"}],
"version":1
}

這里 topics 就是我們想要重新分配的 topics。

 

命令獲得重新分配后的結構。里面的具體參數都很常見在這里不再細說,執行這個命令之后我們會獲得一份 再分區前的數據情況,和再分區之后的情況。

  

我這里貼一份提供格式化之后的 proposed partition reaasigment 我們看下

{
    "version": 1,
    "partitions": [{
        "topic": "__consumer_offsets",
        "partition": 19,
        "replicas": [2],
        "log_dirs": ["any"]
    },
    {
        "topic": "__consumer_offsets",
        "partition": 10,
        "replicas": [2],
        "log_dirs": ["any"]
    },
    {
        "topic": "__consumer_offsets",
        "partition": 14,
        "replicas": [0],
        "log_dirs": ["any"]
    }
//中間省略若干 partitions
}

可以看到 replicas 現在有 0 1 2 三個 broker 了。之后我們選擇執行該方案,將該方案拷貝進 ~/target.json 文件中再運行:

./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/target.json --execute

就可以開始執行再分配的任務了 另外 kafka 會返回給我們消息

Current partition replica assignment

{"version":1,"partitions":[{"topic":"__consumer_offsets","partition":19,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":30,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":47,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":29,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":41,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":39,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":10,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":17,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":14,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":40,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":18,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":26,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":0,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":24,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":33,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":20,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":21,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":3,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":5,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":22,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":12,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":8,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":23,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":15,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":48,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":11,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":13,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":49,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":6,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":28,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":4,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":37,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":31,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":44,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":42,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":34,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":46,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":25,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":45,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":27,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":32,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":43,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":36,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":35,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":7,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":9,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":38,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":1,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":16,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":2,"replicas":[1],"log_dirs":["any"]}]}

Save this to use as the --reassignment-json-file option during rollback
Successfully started reassignment of partitions.

提供給了我們 rollback 的方案,另外這個時候其實就已經開始 ressign 了。ressign 肯定會通知 controller 負責來做這些改變。

我們可以通過命令

./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/target.json --verify

來檢查 ressign 進度

Reassignment of partition __consumer_offsets-38 completed successfully
Reassignment of partition __consumer_offsets-13 completed successfully
Reassignment of partition __consumer_offsets-8 completed successfully
Reassignment of partition __consumer_offsets-5 completed successfully
Reassignment of partition __consumer_offsets-39 completed successfully
Reassignment of partition __consumer_offsets-36 completed successfully
Reassignment of partition __consumer_offsets-40 completed successfully
Reassignment of partition __consumer_offsets-45 completed successfully

到此位置我們可以來檢查一下擴容是否已經成功,leader 副本是否已經被成功的被平均分配到了各個節點上。

 

首先 kafka brokers spread

 

 leader

 

可以看到我們完成了! IT WORKED!這里多提一點是,如果我們需要不讓某個 broker 來存儲該 topic 的副本的時候,我們同樣可以使用 ressign 機制,並且在 broker list 那里去掉對應節點即可。

 

 

為已有分區增加 replica:

通過上面的截圖大家可能也注意到一個問題,由於歷史遺留問題,我們最最最重要的 offsets 提交分區只有一個 replica 也就是只有 leader 副本。通常情況下我們會設置三副本機制使得 broker 掛掉可以輕松切換,所以我們需要為其添加新的 replica vi ~/inc.json 。

{"version":1,
"partitions":[{"topic":"__consumer_offsets","partition":0,"replicas":[0,1,2]}]}
./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/inc.json --execute

 

我這里只延時了調整一個 partition 的事實上我們需要調整 50個 partitions 。。。。 我就不上腳本了

調整完了之后 Replicas 會變成3副本,但是會帶來一個問題就是 ISR 副本肯定需要慢慢才能跟上,還沒有跟上的 Under Replicated  數也就是落后副本數肯定滿載。變成 true。

但是 ISR 副本數會慢慢跟上來

[2020-01-10 12:03:52,781] INFO [Controller id=0] Updated assigned replicas for partition backend-events-21 being reassigned to 0,1,2  (kafka.controller.KafkaController)
[2020-01-10 12:03:52,784] INFO [Controller id=0] Removed partition backend-events-21 from the list of reassigned partitions in zookeeper (kafka.controller.KafkaController)
[2020-01-10 12:03:53,124] INFO [Controller id=0] 3/3 replicas have caught up with the leader for partition backend-events-36 being reassigned.Resuming partition reassignment (kafka.controller.KafkaController)

 

最終會全部趕上來變成下圖的情況。

 

但是呢 現在的 Preferred Leader 還是異常。這里是為什么呢,因為 Replicas 的第一個副本如果是 Leader 說明這里是 Preferred Leader,我們注意上圖所有是 false 的地方都違反了這個規則並非是 Replicas 的第一個副本是作為 Leader。所以一會兒會觸發 kafka 的 preferred replica 選舉.

020-01-10 12:07:09,735] INFO [Controller id=0] Starting preferred replica leader election for partitions backend-events-43 (kafka.controller.KafkaController)
[2020-01-10 12:07:09,735] INFO [PartitionStateMachine controllerId=0] Invoking state change to OnlinePartition for partitions backend-events-43 (kafka.controller.PartitionStateMachine)
[2020-01-10 12:07:09,740] INFO [PreferredReplicaPartitionLeaderSelector]: Current leader 2 for partition backend-events-43 is not the preferred replica. Triggering preferred replica leader election (kafka.controller.PreferredReplicaPartitionLeaderSelector)
[2020-01-10 12:07:09,753] DEBUG [PartitionStateMachine controllerId=0] After leader election, leader cache for backend-events-43 is updated to (Leader:0,ISR:2,0,1,LeaderEpoch:3,ControllerEpoch:42) (kafka.controller.PartitionStateMachine)
[2020-01-10 12:07:09,753] INFO [Controller id=0] Partition backend-events-43 completed preferred replica leader election. New leader is 0 (kafka.controller.KafkaController)

最終會變成下面的情況

 

可以看到這明顯有問題,這是因為剛才我在設置 replica 的時候無腦寫 0 1 2 ,但是因為順序關系,第一個副本位置會被設置成 Preferred leader 導致我們 leader 嚴重偏移了。

現在我們要重新糾正一下這個問題重新執行「增加集群的步驟」重新平衡一下即可,如果你的集群沒有開啟

auto.leader.rebalance.enable = true

執行完后 可能還需要執行一下 

./kafka-preferred-replica-election.sh --zookeeper 10.171.97.1:2181

對 Leader 進行重分配。如果不想這樣我們可以在上面增加副本的時候提前設置好 Preferred  Leader 來避免重新設置 Leader 平衡的過程。

 

最后這個時候再來看下我們集群的狀態

 

Not Bad! 

-----------------------------------------分割線-----------------------------------------

想再分割線下面再補充一點,磁盤的周期性高 io 可能會引起 kafka 響應時間變慢從而導致 controller 認為該節點已經失效,該節點來不及回復來自 zk 的心跳檢查而被踢出集群。

影響了集群的穩定性。

[2020-05-07 23:22:38,125] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition new-play-atom-online-33. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,125] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-play-atom-online-33 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,125] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2 for partition __consumer_offsets-5. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition __consumer_offsets-5 since it is not in the ISR. Leader = 2 ; ISR = List(1, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 2,1 for partition maxwell_mysql_sync_user-4. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition maxwell_mysql_sync_user-4 since it is not in the ISR. Leader = 1 ; ISR = List(2, 1) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 2,0 for partition __consumer_offsets-6. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition __consumer_offsets-6 since it is not in the ISR. Leader = 2 ; ISR = List(2, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2,3 for partition test_sync_tidb-4. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition test_sync_tidb-4 since it is not in the ISR. Leader = 1 ; ISR = List(1, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition new-atom-online-32. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-atom-online-32 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 2,0 for partition double_write-0. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition double_write-0 since it is not in the ISR. Leader = 2 ; ISR = List(2, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition sync_tidb-23. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition sync_tidb-23 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 0,2 for partition new-play-atom-online-2. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,128] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-play-atom-online-2 since it is not in the ISR. Leader = 0 ; ISR = List(0, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,128] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2,3 for partition sync_to_aliyun-12. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] INFO [Controller id=0] New leader and ISR for partition sync_to_aliyun-12 is {"leader":2,"leader_epoch":35,"isr":[1,2]} (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] DEBUG [Controller id=0] Removing replica 3 from ISR 0,1 for partition new-atom-online-39. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-atom-online-39 since it is not in the ISR. Leader = 0 ; ISR = List(0, 1) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2 for partition refresh-wjh-15. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition refresh-wjh-15 since it is not in the ISR. Leader = 1 ; ISR = List(1, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,147] DEBUG The stop replica request (delete = true) sent to broker 3 is  (kafka.controller.ControllerBrokerRequestBatch)

從日志可以看到各 partitions 大規模的移除 replica 3 副本。

 

 

Reference:

http://kafka.apache.org/10/documentation.html#basic_ops_cluster_expansion

http://kafka.apache.org/10/documentation.html#basic_ops_increase_replication_factor

https://zhuanlan.zhihu.com/p/38721205    KAFKA集群擴容后的數據遷移

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM