Flink JobManager HA模式部署(基於Standalone)


    參考文章:https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#bootstrap-zookeeper

    Flink典型的任務處理過程如下所示:

     很容易發現,JobManager存在單點故障(SPOF:Single Point Of Failure),因此對Flink做HA,主要是對JobManager做HA,根據Flink集群的部署模式不同,分為Standalone、OnYarn,本文主要涉及Standalone模式。

    JobManager的HA,是通過Zookeeper實現的,因此需要先搭建好Zookeeper集群,同時HA的信息,還要存儲在HDFS中,因此也需要Hadoop集群,最后修改Flink中的配置文件。

一、部署Zookeeper集群

    參考博文:http://www.cnblogs.com/liugh/p/6671460.html

二、部署Hadoop集群

   參考博文:http://www.cnblogs.com/liugh/p/6624872.html

三、部署Flink集群

    參考博文:http://www.cnblogs.com/liugh/p/7446295.html

四、conf/flink-conf.yaml修改

4.1 必選項

high-availability: zookeeper
high-availability.zookeeper.quorum: DEV-SH-MAP-01:2181,DEV-SH-MAP-02:2181,DEV-SH-MAP-03:2181
high-availability.zookeeper.storageDir: hdfs:///flink/ha

 

4.2 可選項

high-availability.zookeeper.path.root: /flink
high-availability.zookeeper.path.cluster-id: /map_flink

  修改完后,使用scp命令將flink-conf.yaml文件同步到其他節點

五、conf/masters修改

 設置要啟用JobManager的節點及端口:

dev-sh-map-01:8081
dev-sh-map-02:8081

修改完后,使用scp命令將masters文件同步到其他節點

 六、conf/zoo.cfg修改

# ZooKeeper quorum peers
server.1=DEV-SH-MAP-01:2888:3888
server.2=DEV-SH-MAP-02:2888:3888
server.3=DEV-SH-MAP-03:2888:3888

修改完后,使用scp命令將masters文件同步到其他節點

七、啟動HDFS

[root@DEV-SH-MAP-01 conf]# start-dfs.sh
Starting namenodes on [DEV-SH-MAP-01]
DEV-SH-MAP-01: starting namenode, logging to /usr/hadoop-2.7.3/logs/hadoop-root-namenode-DEV-SH-MAP-01.out
DEV-SH-MAP-02: starting datanode, logging to /usr/hadoop-2.7.3/logs/hadoop-root-datanode-DEV-SH-MAP-02.out
DEV-SH-MAP-03: starting datanode, logging to /usr/hadoop-2.7.3/logs/hadoop-root-datanode-DEV-SH-MAP-03.out
DEV-SH-MAP-01: starting datanode, logging to /usr/hadoop-2.7.3/logs/hadoop-root-datanode-DEV-SH-MAP-01.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-DEV-SH-MAP-01.out

 

八、啟動Zookeeper集群

[root@DEV-SH-MAP-01 conf]# start-zookeeper-quorum.sh 
Starting zookeeper daemon on host DEV-SH-MAP-01.
Starting zookeeper daemon on host DEV-SH-MAP-02.
Starting zookeeper daemon on host DEV-SH-MAP-03.

 【注】這里使用的命令start-zookeeper-quorum.sh是FLINK_HOME/bin中的腳本

 

九、啟動Flink集群

[root@DEV-SH-MAP-01 conf]# start-cluster.sh 
Starting HA cluster with 2 masters.
Starting jobmanager daemon on host DEV-SH-MAP-01.
Starting jobmanager daemon on host DEV-SH-MAP-02.
Starting taskmanager daemon on host DEV-SH-MAP-01.
Starting taskmanager daemon on host DEV-SH-MAP-02.
Starting taskmanager daemon on host DEV-SH-MAP-03.

    可以看到,啟動了兩個JobManager,一個Leader,一個Standby

十、測試HA

   10.1 訪問Leader的WebUI:

 

    10.2 訪問StandBy的WebUI

       這時也會跳轉到Leader的WebUI

 

   10.3 Kill掉Leader

[root@DEV-SH-MAP-01 flink-1.3.2]# jps
14240 Jps
34929 TaskManager
33106 DataNode
33314 SecondaryNameNode
34562 JobManager
33900 FlinkZooKeeperQuorumPeer
32991 NameNode
[root@DEV-SH-MAP-01 flink-1.3.2]# kill -9 34562
[root@DEV-SH-MAP-01 flink-1.3.2]# jps
34929 TaskManager
33106 DataNode
33314 SecondaryNameNode
14275 Jps
33900 FlinkZooKeeperQuorumPeer
32991 NameNode

   再次訪問Flink WebUI,發現Leader已經發生切換

 

   10.4 重啟被Kill掉的JobManager

   

[root@DEV-SH-MAP-01 bin]# jobmanager.sh start cluster DEV-SH-MAP-01
Starting jobmanager daemon on host DEV-SH-MAP-01.
[root@DEV-SH-MAP-01 bin]# jps
34929 TaskManager
33106 DataNode
33314 SecondaryNameNode
15506 JobManager
15559 Jps
33900 FlinkZooKeeperQuorumPeer
32991 NameNode

  再次查看WebUI,發現雖然以前被Kill掉的Leader起來了,但是現在仍是StandBy,現有的Leader不會發生切換,也就是Flink下面的示意圖:

 

 

十一、存在的問題

  JobManager發生切換時,TaskManager也會跟着發生重啟


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM