flink高可用（standlone）的配置和問題解決

本文轉載自查看原文 2020-11-26 17:38 543 flink高可用配置/ flink高可用無法啟動

首先還是修改flink安裝目錄的conf目錄下flink-conf.yaml文件，找到如下的三個配置，把原本的注釋放開，然后配置自己的hdfs地址和zookeeper地址。

需要注意的是，我這里的hdfs是之前的ha集群，mycluster是我的hdfs的集群名，至於后邊的內容會在hdfs中創建路徑，可以自定義，不需要提前創建。

1 high-availability: zookeeper
2 high-availability.storageDir: hdfs://mycluster/flink/ha/
3 high-availability.zookeeper.quorum: node7-2:2181,node7-3:2181,node7-4:2181

workers修改

上一篇有提到過，舊版本的flink中有個文件叫slaves，新版的就叫這個workers，代表的是taskManger節點，之前我配置了三個，現在其中一個換成jobManager，所以刪掉一個之后內容如下：

node7-2
node7-3
node7-4

masters修改

之前的監看flink集群搭建時，是沒有管這個文件的，因為jobManager就只有一個，現在有了兩個jobManager，就需要修改這個文件制定jobManager集群的節點。

實際上從這里，尤其是之前的masters和slaves這兩個文件的命令，也很容易看出來他們的主從關系。

修改后的masters文件內容如下：

node7-1:8081
node7-2:8081

配置文件同步分發

和hdfs一樣，和flink簡單集群一樣，這些修改的配置文件也都需要同步分發到所有的節點中，scp就不多說了。

hadoop依賴jar下載

上邊操作完成后，我就使用start-cluster.sh啟動的集群，然后看到打印出了如下的信息：

Starting HA cluster with 2 masters.
Starting standalonesession daemon on host node7-1.
Starting standalonesession daemon on host node7-2.
Starting taskexecutor daemon on host node7-2.
Starting taskexecutor daemon on host node7-3.
Starting taskexecutor daemon on host node7-4.

也沒有報錯，我以為就成功了，但是當我訪問web頁面時，無論是http://node001:8081還是http://node002:8081都無法訪問，於是查看了flink的日志文件，結果發現日志中打印了如下的異常信息：

2020-11-26 04:48:36,426 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could not start cluster entrypoint StandaloneSessionClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneSessionClusterEntrypoint.

看起來就是無法識別和連接hdfs，實際上是因為沒有相關的依賴，因此需要下載flink依賴的hadoop的jar到flink安裝目錄下的lib目錄下。

這個插件在flink官網可以找到，https://flink.apache.org/downloads.html，這個連接中Additional Components下就是flink依賴的hadoop插件。

按網上說的，需要根據相應的hadoop版本下載對應的插件版本，但是我的hadoop是3.1.3，而這個頁面中最高才是2.8.3，因此最終就下載了這個版本。

之后重新執行start-cluster.sh后日志沒有再打印上邊的異常，同時web頁面也都可以成功打開了，並能看到兩個taskManger。

在web頁面提交上一次做好的flink程序的jar之后，也能看到running狀態，似乎ha模式搭建成功了，但是實際上並不是。

`log`后發現了如下的異常：

 1 StandaloneSessionClusterEntrypoint down with application status FAILED. Diagnostics java.io.IOException: Could not create FileSystem for highly available storage path (hdfs://jh/flink/ha/flinkCluster)
 2 at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:103)
 3 at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:89)
 4 at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:117)
 5 at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:306)
 6 at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:269)
 7 at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:211)
 8 at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
 9 at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
10 at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
11 at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:520)
12 at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:64)

經過一番查詢和嘗試后找到了解決辦法，即配置兩個環境變量，環境變量的配置方式較多，可以配系統變量，可以配用戶變量，我就直接配置的系統變量，執行vi /etc/profile，然后加入如下兩行：

export HADOOP_HOME=/data/hadoop/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

配置完成后使用source /etc/profile重新加載剛修改的內容，然后重新提交flink程序jar后日志不在報錯，同時再次在nc中輸入單詞后，在web界面的Stdout中便能成功的刷新出預想的結果，至此，flink的ha模式搭建成功，搭建過程也算是對flink的設計思想和架構有了更進一步的認識。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HA高可用集群中"腦裂"問題解決 - 運維總結 hadoop 集群HA高可用搭建以及問題解決方案 k8s 高可用reboot之后 NotReady 問題解決 Flink常見問題解決記錄 redis 記一次搭建高可用redis集群過程,問題解決;Node 192.168.184.133:8001 is not configured as a cluster node flink問題解決集合一：集群安裝使用消息隊列高可用、冪等性、順序性、可靠性傳輸、堆積問題解決 Jenkins 郵箱配置及問題解決 Flink Standalone集群jobmanagers高可用配置 Gerrit 配置問題解決