Kafka 進行機器擴容后的副本再平衡和為已有分區增加 replica 實踐

本文轉載自查看原文 2020-01-09 17:49 2455 Kafka/ BigData

今天是繼續對之前 kafka 集群遺留問題的查漏補缺。

擴容后對副本進行再平衡：

今天檢查 kafka manager 發現了一個 __consumer_offsets 主題（消費者分區位移保存主題）的 leader 副本只被部署在了已有三節點中的兩個節點上。並沒有將三個 broker 上都平均分布上副本，具體表現為

我們點開這個主題

可以發現原本是三個節點的我們，卻非常不均勻的只有兩個節點承擔了存放該 partition 的任務。

所以我們需要重新非配這個 topic 的副本均勻的到三個節點上去。

根據官方文檔提供的方法，我們可以借助

./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --topics-to-move-json-file ~/consumer_offsets_adjust.json --broker-list "0,1,2" --generate

這里的 consumer_offsets_adjust.json 的格式官方給了一個格式

> cat topics-to-move.json
{"topics": [{"topic": "foo1"},
            {"topic": "foo2"}],
"version":1
}

這里 topics 就是我們想要重新分配的 topics。

命令獲得重新分配后的結構。里面的具體參數都很常見在這里不再細說，執行這個命令之后我們會獲得一份再分區前的數據情況，和再分區之后的情況。

我這里貼一份提供格式化之后的 proposed partition reaasigment 我們看下

{
    "version": 1,
    "partitions": [{
        "topic": "__consumer_offsets",
        "partition": 19,
        "replicas": [2],
        "log_dirs": ["any"]
    },
    {
        "topic": "__consumer_offsets",
        "partition": 10,
        "replicas": [2],
        "log_dirs": ["any"]
    },
    {
        "topic": "__consumer_offsets",
        "partition": 14,
        "replicas": [0],
        "log_dirs": ["any"]
    }
//中間省略若干 partitions
}

可以看到 replicas 現在有 0 1 2 三個 broker 了。之后我們選擇執行該方案，將該方案拷貝進 ~/target.json 文件中再運行：

./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/target.json --execute

就可以開始執行再分配的任務了另外 kafka 會返回給我們消息

Current partition replica assignment

{"version":1,"partitions":[{"topic":"__consumer_offsets","partition":19,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":30,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":47,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":29,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":41,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":39,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":10,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":17,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":14,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":40,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":18,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":26,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":0,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":24,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":33,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":20,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":21,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":3,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":5,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":22,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":12,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":8,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":23,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":15,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":48,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":11,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":13,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":49,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":6,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":28,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":4,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":37,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":31,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":44,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":42,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":34,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":46,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":25,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":45,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":27,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":32,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":43,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":36,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":35,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":7,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":9,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":38,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":1,"replicas":[0],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":16,"replicas":[1],"log_dirs":["any"]},{"topic":"__consumer_offsets","partition":2,"replicas":[1],"log_dirs":["any"]}]}

Save this to use as the --reassignment-json-file option during rollback
Successfully started reassignment of partitions.

提供給了我們 rollback 的方案，另外這個時候其實就已經開始 ressign 了。ressign 肯定會通知 controller 負責來做這些改變。

我們可以通過命令

./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/target.json --verify

來檢查 ressign 進度

Reassignment of partition __consumer_offsets-38 completed successfully
Reassignment of partition __consumer_offsets-13 completed successfully
Reassignment of partition __consumer_offsets-8 completed successfully
Reassignment of partition __consumer_offsets-5 completed successfully
Reassignment of partition __consumer_offsets-39 completed successfully
Reassignment of partition __consumer_offsets-36 completed successfully
Reassignment of partition __consumer_offsets-40 completed successfully
Reassignment of partition __consumer_offsets-45 completed successfully

到此位置我們可以來檢查一下擴容是否已經成功，leader 副本是否已經被成功的被平均分配到了各個節點上。

首先 kafka brokers spread

leader

可以看到我們完成了！ IT WORKED!這里多提一點是，如果我們需要不讓某個 broker 來存儲該 topic 的副本的時候，我們同樣可以使用 ressign 機制，並且在 broker list 那里去掉對應節點即可。

為已有分區增加 replica：

通過上面的截圖大家可能也注意到一個問題，由於歷史遺留問題，我們最最最重要的 offsets 提交分區只有一個 replica 也就是只有 leader 副本。通常情況下我們會設置三副本機制使得 broker 掛掉可以輕松切換，所以我們需要為其添加新的 replica vi ~/inc.json 。

{"version":1,
"partitions":[{"topic":"__consumer_offsets","partition":0,"replicas":[0,1,2]}]}

./kafka-reassign-partitions.sh --zookeeper 10.171.97.1:2181 --reassignment-json-file ~/inc.json --execute

我這里只延時了調整一個 partition 的事實上我們需要調整 50個 partitions 。。。。我就不上腳本了

調整完了之后 Replicas 會變成3副本，但是會帶來一個問題就是 ISR 副本肯定需要慢慢才能跟上，還沒有跟上的 Under Replicated 數也就是落后副本數肯定滿載。變成 true。

但是 ISR 副本數會慢慢跟上來

[2020-01-10 12:03:52,781] INFO [Controller id=0] Updated assigned replicas for partition backend-events-21 being reassigned to 0,1,2  (kafka.controller.KafkaController)
[2020-01-10 12:03:52,784] INFO [Controller id=0] Removed partition backend-events-21 from the list of reassigned partitions in zookeeper (kafka.controller.KafkaController)
[2020-01-10 12:03:53,124] INFO [Controller id=0] 3/3 replicas have caught up with the leader for partition backend-events-36 being reassigned.Resuming partition reassignment (kafka.controller.KafkaController)

最終會全部趕上來變成下圖的情況。

但是呢現在的 Preferred Leader 還是異常。這里是為什么呢，因為 Replicas 的第一個副本如果是 Leader 說明這里是 Preferred Leader，我們注意上圖所有是 false 的地方都違反了這個規則並非是 Replicas 的第一個副本是作為 Leader。所以一會兒會觸發 kafka 的 preferred replica 選舉.

020-01-10 12:07:09,735] INFO [Controller id=0] Starting preferred replica leader election for partitions backend-events-43 (kafka.controller.KafkaController)
[2020-01-10 12:07:09,735] INFO [PartitionStateMachine controllerId=0] Invoking state change to OnlinePartition for partitions backend-events-43 (kafka.controller.PartitionStateMachine)
[2020-01-10 12:07:09,740] INFO [PreferredReplicaPartitionLeaderSelector]: Current leader 2 for partition backend-events-43 is not the preferred replica. Triggering preferred replica leader election (kafka.controller.PreferredReplicaPartitionLeaderSelector)
[2020-01-10 12:07:09,753] DEBUG [PartitionStateMachine controllerId=0] After leader election, leader cache for backend-events-43 is updated to (Leader:0,ISR:2,0,1,LeaderEpoch:3,ControllerEpoch:42) (kafka.controller.PartitionStateMachine)
[2020-01-10 12:07:09,753] INFO [Controller id=0] Partition backend-events-43 completed preferred replica leader election. New leader is 0 (kafka.controller.KafkaController)

最終會變成下面的情況

可以看到這明顯有問題，這是因為剛才我在設置 replica 的時候無腦寫 0 1 2 ，但是因為順序關系，第一個副本位置會被設置成 Preferred leader 導致我們 leader 嚴重偏移了。

現在我們要重新糾正一下這個問題重新執行「增加集群的步驟」重新平衡一下即可，如果你的集群沒有開啟

auto.leader.rebalance.enable = true

執行完后可能還需要執行一下

./kafka-preferred-replica-election.sh --zookeeper 10.171.97.1:2181

對 Leader 進行重分配。如果不想這樣我們可以在上面增加副本的時候提前設置好 Preferred Leader 來避免重新設置 Leader 平衡的過程。

最后這個時候再來看下我們集群的狀態

Not Bad!

-----------------------------------------分割線-----------------------------------------

想再分割線下面再補充一點，磁盤的周期性高 io 可能會引起 kafka 響應時間變慢從而導致 controller 認為該節點已經失效，該節點來不及回復來自 zk 的心跳檢查而被踢出集群。

影響了集群的穩定性。

[2020-05-07 23:22:38,125] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition new-play-atom-online-33. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,125] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-play-atom-online-33 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,125] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2 for partition __consumer_offsets-5. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition __consumer_offsets-5 since it is not in the ISR. Leader = 2 ; ISR = List(1, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 2,1 for partition maxwell_mysql_sync_user-4. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition maxwell_mysql_sync_user-4 since it is not in the ISR. Leader = 1 ; ISR = List(2, 1) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 2,0 for partition __consumer_offsets-6. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition __consumer_offsets-6 since it is not in the ISR. Leader = 2 ; ISR = List(2, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2,3 for partition test_sync_tidb-4. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition test_sync_tidb-4 since it is not in the ISR. Leader = 1 ; ISR = List(1, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,126] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition new-atom-online-32. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-atom-online-32 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 2,0 for partition double_write-0. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition double_write-0 since it is not in the ISR. Leader = 2 ; ISR = List(2, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 1,0 for partition sync_tidb-23. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition sync_tidb-23 since it is not in the ISR. Leader = 0 ; ISR = List(1, 0) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,127] DEBUG [Controller id=0] Removing replica 3 from ISR 0,2 for partition new-play-atom-online-2. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,128] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-play-atom-online-2 since it is not in the ISR. Leader = 0 ; ISR = List(0, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,128] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2,3 for partition sync_to_aliyun-12. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] INFO [Controller id=0] New leader and ISR for partition sync_to_aliyun-12 is {"leader":2,"leader_epoch":35,"isr":[1,2]} (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] DEBUG [Controller id=0] Removing replica 3 from ISR 0,1 for partition new-atom-online-39. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition new-atom-online-39 since it is not in the ISR. Leader = 0 ; ISR = List(0, 1) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] DEBUG [Controller id=0] Removing replica 3 from ISR 1,2 for partition refresh-wjh-15. (kafka.controller.KafkaController)
[2020-05-07 23:22:38,131] WARN [Controller id=0] Cannot remove replica 3 from ISR of partition refresh-wjh-15 since it is not in the ISR. Leader = 1 ; ISR = List(1, 2) (kafka.controller.KafkaController)
[2020-05-07 23:22:38,147] DEBUG The stop replica request (delete = true) sent to broker 3 is  (kafka.controller.ControllerBrokerRequestBatch)

從日志可以看到各 partitions 大規模的移除 replica 3 副本。

Reference:

http://kafka.apache.org/10/documentation.html#basic_ops_cluster_expansion

http://kafka.apache.org/10/documentation.html#basic_ops_increase_replication_factor

https://zhuanlan.zhihu.com/p/38721205 KAFKA集群擴容后的數據遷移

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 kafka之partition分區及副本replica升級 kafka 再平衡機制（三）kafka集群擴容后的topic分區遷移 kafka分區和副本如何分配 Kafka副本管理—— 為何去掉replica.lag.max.messages參數詳細解析kafka之kafka分區和副本 MongoDB（五）-- 副本集（replica Set） kafka項目經驗之如何進行Kafka壓力測試、如何計算Kafka分區數、如何確定Kaftka集群機器數量 Elasticsearch--集群管理_再平衡&預熱 kafka如何確定機器數量和topic分區個數

Kafka 進行機器擴容后的副本再平衡 和 為已有分區增加 replica 實踐

免責聲明！

Kafka 進行機器擴容后的副本再平衡和為已有分區增加 replica 實踐