Flink : Setup Flink on Yarn

本文轉載自查看原文 2020-05-17 16:14 1112 Flink

Flink on Yarn 的兩種模式

Yarn Session：啟動一個長期運行的 Yarn 程序，這個 Yarn 程序在不同的 container 上啟動 Job Manager 和 Task Manager，實現了 Flink 集群功能，然后每個 Flink app 都提交到這個由 Yarn 實現的 Flink 集群
Single Job on Yarn：不需要有長期運行的 Flink 集群，每次運行 Flink 程序都是一個獨立的 Yarn 程序，這個程序會啟動 JobManager 和 TaskManager 並運行 Flink app，程序結束后會全部退出

兩種模式的比較：

Yarn Session 需要在通過 Yarn 啟動集群時分配足夠大的資源，因為可能需要支持很多個 Job 的運行，一旦 Yarn Session 的資源耗盡就無法再提交 Job，哪怕此時 Yarn 依然有足夠的資源，並且這個 Yarn Session 如果出了什么問題，會影響所有的 Job，好處是不用每次都啟動 Flink 集群，並且有統一的 Job Manager 對所有 Job 進行管理
Single Job 的資源配置更靈活，完全依據每個 Job 的需求進行配置，沒有 Job 運行時不會有資源被占用，運行的 Job 很多時，只要 Yarn 有足夠的資源就可以提交，每個 Job 都是在獨立的 Yarn 程序運行，但每次都要啟動 Flink 集群，並且所有 Job 都是獨立的，沒有統一的 Job Manager 管理
Yarn Session 一般用於測試環境，或是計算量小、運行時間短的程序
Single Job 一般用於生產環境，或是計算量大，長期運行的程序
兩種模式可以在一個環境中同時存在（因為提交 Job 的命令是不一樣的）

安裝

Flink 包（https://flink.apache.org/downloads.html）

wget https://archive.apache.org/dist/flink/flink-1.10.0/flink-1.10.0-bin-scala_2.12.tgz

tar xzf flink-1.10.0-bin-scala_2.12.tgz

cd flink-1.10.0/lib
wget https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-9.0/flink-shaded-hadoop-2-uber-2.7.5-9.0.jar

Flink on Yarn 不需要對 masters，slaves 做配置

Yarn Session

通過下面的命令啟動

./bin/yarn-session.sh

該命令的參數如下

Usage:
   Optional
     -at,--applicationType <arg>     Set a custom application type for the application on YARN
     -D <property=value>             use value for given property
     -d,--detached                   If present, runs the job in detached mode
     -h,--help                       Help for the Yarn session CLI.
     -id,--applicationId <arg>       Attach to running YARN session
     -j,--jar <arg>                  Path to Flink jar file
     -jm,--jobManagerMemory <arg>    Memory for JobManager Container with optional unit (default: MB)
     -m,--jobmanager <arg>           Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration.
     -nl,--nodeLabel <arg>           Specify YARN node label for the YARN application
     -nm,--name <arg>                Set a custom name for the application on YARN
     -q,--query                      Display available YARN resources (memory, cores)
     -qu,--queue <arg>               Specify YARN queue.
     -s,--slots <arg>                Number of slots per TaskManager
     -t,--ship <arg>                 Ship files in the specified directory (t for transfer)
     -tm,--taskManagerMemory <arg>   Memory per TaskManager Container with optional unit (default: MB)
     -yd,--yarndetached              If present, runs the job in detached mode (deprecated; use non-YARN specific option instead)
     -z,--zookeeperNamespace <arg>   Namespace to create the Zookeeper sub-paths for high availability mode

命令輸出如下

lin@Ubuntu-VM-1:$ ./bin/yarn-session.sh
2020-05-16 08:28:41,872 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2020-05-16 08:28:41,877 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2020-05-16 08:28:41,878 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2020-05-16 08:28:41,879 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.memory.process.size, 1568m
2020-05-16 08:28:41,879 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2020-05-16 08:28:41,882 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2020-05-16 08:28:41,885 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-05-16 08:28:42,579 WARN  org.apache.flink.runtime.util.HadoopUtils                     - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables).
2020-05-16 08:28:44,641 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-16 08:28:45,093 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to lin (auth:SIMPLE)
2020-05-16 08:28:45,169 INFO  org.apache.flink.runtime.security.modules.JaasModule          - Jaas file will be created as /tmp/jaas-8610345087872643912.conf.
2020-05-16 08:28:45,278 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The configuration directory ('/home/lin/myTest/flink/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-16 08:28:45,592 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-16 08:28:46,495 INFO  org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils  - The derived from fraction jvm overhead memory (156.800mb (164416719 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2020-05-16 08:28:47,184 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-05-16 08:28:47,621 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1568, slotsPerTaskManager=1}
2020-05-16 08:28:49,579 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
2020-05-16 08:28:56,822 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1589570585722_0007
2020-05-16 08:28:57,435 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1589570585722_0007
2020-05-16 08:28:57,436 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated
2020-05-16 08:28:57,464 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED
2020-05-16 08:29:29,514 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.
2020-05-16 08:29:29,516 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Found Web Interface ubuntu-vm-1:45993 of application 'application_1589570585722_0007'.
JobManager Web Interface: http://ubuntu-vm-1:45993

通過 yarn 命令查看運行的程序

lin@Ubuntu-VM-1:$ ./bin/yarn application -list
20/05/16 09:51:46 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                Application-Id      Application-Name        Application-Type          User           Queue                   State             Final-State             Progress                        Tracking-URL
application_1589570585722_0007  Flink session cluster           Apache Flink           lin         default                 RUNNING               UNDEFINED                 100%            http://Ubuntu-VM-1:45993

可以看到無論 yarn-session.sh 的輸出還是 yarn application -list 的輸出都有 http://Ubuntu-VM-1:45993，這是 Job Manager 的 Web UI，可以登錄查看

通過 jps -ml 命令可以看到啟動的程序是

org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint

另外 yarn-session.sh 命令不會退出，如果希望用分離模式，應該加上 -d （detach）選項

./bin/yarn-session.sh -d

在沒有用分離模式的情況下，yarn-session.sh 可以接受 stop 命令退出

stop
2020-05-16 09:58:54,146 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Deleted Yarn properties file at /tmp/.yarn-properties-lin
2020-05-16 09:58:54,759 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Application application_1589570585722_0007 finished with state FINISHED and final state SUCCEEDED at 1589594332039

在分離模式下，可以用下面的命令重新 attach 上

./bin/yarn-session.sh -id <appId>

可以用下面的命令退出

echo "stop" | ./bin/yarn-session.sh -id <appId>

實際上這時還沒有啟動 Task Manager，在 Flink on Yarn 模式下，Task Manager 的數量由系統依據提交的 Job 的並發度以及配置的每個 Task Manager 的 slot 數量決定，並且是動態創建和停止的

看官網的說法：
The example invocation starts a single container for the ApplicationMaster which runs the Job Manager.
The session cluster will automatically allocate additional containers which run the Task Managers when jobs are submitted to the cluster.

為了提交 Job 的時候可以知道 Yarn Session 的 ID，這個 ID 被記錄到一個文件，地址配在

# flink-conf.yaml
yarn.properties-file.location

如果沒配就用

System.getProperty("java.io.tmpdir");

默認在 /tmp 目錄

lin@Ubuntu-VM-1:$ cat /tmp/.yarn-properties-lin
#Generated YARN properties file
#Sat May 16 10:02:09 CST 2020
dynamicPropertiesString=
applicationID=application_1589570585722_0008

提交 Job

lin@Ubuntu-VM-1:$ ./bin/flink run -p 5 examples/batch/WordCount.jar
2020-05-16 08:03:07,531 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn properties file under /tmp/.yarn-properties-lin.
2020-05-16 08:03:07,531 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn properties file under /tmp/.yarn-properties-lin.
2020-05-16 08:03:08,080 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The configuration directory ('/home/lin/myTest/flink/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-16 08:03:08,080 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The configuration directory ('/home/lin/myTest/flink/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
2020-05-16 08:03:09,443 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-16 08:03:09,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-05-16 08:03:09,783 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-05-16 08:03:10,052 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Found Web Interface ubuntu-vm-1:35973 of application 'application_1589570585722_0006'.
Job has been submitted with JobID 6d0be776c392f5408e2d611e5f71a011
Program execution finished
Job with JobID 6d0be776c392f5408e2d611e5f71a011 has finished.
Job Runtime: 27096 ms
Accumulator Results:
- c2f8af07593812e7ae866e6da873aa95 (java.util.ArrayList) [170 elements]

(after,1)
(bare,1)
(coil,1)
......

run 命令通過 /tmp/.yarn-properties-lin 文件找到 Yarn Session 的 APP ID，並提交 Job

jps -ml 命令可以看到提交了 Job 后多了下面這個程序（有一個或多個）

org.apache.flink.yarn.YarnTaskExecutorRunner

這個程序就是 Task Manager

程序運行結束后過一會這些 Task Manager 就不在了

run 命令會一直保持在前端運行，直到 Job 完成，如果希望提交后退出要用 -d （detach）參數

run 命令的部分參數如下

Action "run" compiles and runs a program.

  Syntax: run [OPTIONS] <jar-file> <arguments>
  "run" action options:
     -c,--class <classname>               Class with the program entry point
                                          ("main()" method). Only needed if the
                                          JAR file does not specify the class in
                                          its manifest.

     -C,--classpath <url>                 Adds a URL to each user code
                                          classloader  on all nodes in the cluster. 

     -d,--detached                        If present, runs the job in detached mode

     -n,--allowNonRestoredState           Allow to skip savepoint state that
                                          cannot be restored. You need to allow
                                          this if you removed an operator from
                                          your program that was part of the
                                          program when the savepoint was
                                          triggered.

     -p,--parallelism <parallelism>       The parallelism with which to run the
                                          program. Optional flag to override the
                                          default value specified in the configuration.

     -py,--python <pythonFile>            Python script with the program entry
                                          point. The dependent resources can be
                                          configured with the `--pyFiles` option.

     -pyarch,--pyArchives <arg>           Add python archive files for job. 

     -pyfs,--pyFiles <pythonFiles>        Attach custom python files for job.

     -pym,--pyModule <pythonModule>       Python module with the program entry point.

     -pyreq,--pyRequirements <arg>        Specify a requirements.txt file

     -s,--fromSavepoint <savepointPath>   Path to a savepoint to restore the job
                                          from (for example hdfs:///flink/savepoint-1537).

     -sae,--shutdownOnAttachedExit        If the job is submitted in attached
                                          mode, perform a best-effort cluster
                                          shutdown when the CLI is terminated
                                          abruptly, e.g., in response to a user
                                          interrupt, such as typing Ctrl + C.

  Options for yarn-cluster mode:
     -d,--detached                        If present, runs the job in detached mode
     -m,--jobmanager <arg>                Address of the JobManager
     -yat,--yarnapplicationType <arg>     Set a custom application type for the application on YARN
     -yD <property=value>                 use value for given property
     -yh,--yarnhelp                       Help for the Yarn session CLI.
     -yid,--yarnapplicationId <arg>       Attach to running YARN session
     -yj,--yarnjar <arg>                  Path to Flink jar file
     -yjm,--yarnjobManagerMemory <arg>    Memory for JobManager Container with optional unit (default: MB)
     -ynl,--yarnnodeLabel <arg>           Specify YARN node label for the YARN application
     -ynm,--yarnname <arg>                Set a custom name for the application on YARN
     -yq,--yarnquery                      Display available YARN resources (memory, cores)
     -yqu,--yarnqueue <arg>               Specify YARN queue.
     -ys,--yarnslots <arg>                Number of slots per TaskManager
     -yt,--yarnship <arg>                 Ship files in the specified directory (t for transfer)
     -ytm,--yarntaskManagerMemory <arg>   Memory per TaskManager Container with optional unit (default: MB)
     -yz,--yarnzookeeperNamespace <arg>   Namespace to create the Zookeeper sub-paths for high availability mode
     -z,--zookeeperNamespace <arg>        Namespace to create the Zookeeper sub-paths for high availability mode

  Options for executor mode:
     -D <property=value>   Generic configuration options for
                           execution/deployment and for the configured executor.
                           The available options can be found at
                           https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html
     -e,--executor <arg>   The name of the executor to be used for executing the
                           given job, which is equivalent to the
                           "execution.target" config option. The currently
                           available executors are: "remote", "local",
                           "kubernetes-session", "yarn-per-job", "yarn-session".

  Options for default mode:
     -m,--jobmanager <arg>           Address of the JobManager (master) to which to connect.
     -z,--zookeeperNamespace <arg>   Namespace to create the Zookeeper sub-paths for high availability mode

程序如果出現異常可以在 Web UI 上看到，但更具體的 log 似乎不好查，因為 Task Manager 退出了
非分離模式下 flink run 命令可以看到所有 log

Single Job

無需先啟動 Flink 集群，直接提交 Job 即可

./bin/flink run \
            -m yarn-cluster \
            -ynm "Word Counter Test" \
            examples/batch/WordCount.jar \
            --input /home/lin/myTest/flink/flink-1.10.0/README.txt

輸出為

lin@Ubuntu-VM-1:$ ./bin/flink run -m yarn-cluster -ynm "Word Counter Test" examples/batch/WordCount.jar --input /home/lin/myTest/flink/flink-1.10.0/README.txt
2020-05-16 16:23:49,429 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The configuration directory ('/home/lin/myTest/flink/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-05-16 16:23:49,429 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The configuration directory ('/home/lin/myTest/flink/flink-1.10.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
Printing result to stdout. Use --output to specify output path.
2020-05-16 16:23:50,316 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
2020-05-16 16:23:50,680 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-05-16 16:23:51,042 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - Neither the HADOOP_CONF_DIR nor
the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-05-16 16:23:51,159 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1568, slotsPerTaskManager=1}
2020-05-16 16:23:51,534 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
2020-05-16 16:23:52,847 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1589570585722_0014
2020-05-16 16:23:52,957 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1589570585722_0014
2020-05-16 16:23:52,958 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated
2020-05-16 16:23:52,964 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED
2020-05-16 16:24:06,594 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.
2020-05-16 16:24:06,596 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Found Web Interface ubuntu-vm-1:33437 of application 'application_1589570585722_0014'.
Job has been submitted with JobID f0c62fc833814a2dfc59eecd360bc2a8
Program execution finished
Job with JobID f0c62fc833814a2dfc59eecd360bc2a8 has finished.
Job Runtime: 16036 ms
Accumulator Results:
- 8a1ab0e55aed34e646fc443e00ab961d (java.util.ArrayList) [111 elements]

(1,1)
(13,1)
(5d002,1)
(740,1)
(about,1)
(account,1)
(administration,1)
......

查看 Yarn 程序

lin@Ubuntu-VM-1:$ ./bin/yarn application -list
20/05/16 16:21:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                Application-Id      Application-Name        Application-Type          User           Queue                   State             Final-State             Progress                        Tracking-URL
application_1589570585722_0014     Word Counter Test            Apache Flink           lin         default                 RUNNING               UNDEFINED                 100%            http://Ubuntu-VM-1:36203

查看 log

./bin/yarn logs -applicationId application_1589570585722_0014

但好像看不到 Job 通過 System.out.println 打出來的 log

非分離模式下，flink run 命令可以看到所有 log

Recovery Behaviour

Flink’s YARN client has the following configuration parameters to control how to behave in case of container failures. These parameters can be set either from the conf/flink-conf.yaml or when starting the YARN session, using -D parameters.

yarn.application-attempts: The number of ApplicationMaster (+ its TaskManager containers) attempts. If this value is set to 1 (default), the entire YARN session will fail when the Application master fails. Higher values specify the number of restarts of the ApplicationMaster by YARN.

Setup for application priority on YARN

Flink’s YARN client has the following configuration parameters to setup application priority. These parameters can be set either from the conf/flink-conf.yaml or when starting the YARN session, using -D parameters.

yarn.application.priority: A non-negative integer indicating the priority for submitting a Flink YARN application. It will only take effect if YARN priority scheduling setting is enabled. Larger integer corresponds with higher priority. If priority is negative or set to ‘-1’(default), Flink will unset yarn priority setting and use cluster default priority. Please refer to YARN’s official documentation for specific settings required to enable priority scheduling for the targeted YARN version.

Running Flink on YARN behind Firewalls

Some YARN clusters use firewalls for controlling the network traffic between the cluster and the rest of the network. In those setups, Flink jobs can only be submitted to a YARN session from within the cluster’s network (behind the firewall). If this is not feasible for production use, Flink allows to configure a port range for its REST endpoint, used for the client-cluster communication. With this range configured, users can also submit jobs to Flink crossing the firewall.

The configuration parameter for specifying the REST endpoint port is the following:

rest.bind-port

Flink on Yarn 的架構

The YARN client needs to access the Hadoop configuration to connect to the YARN resource manager and HDFS. It determines the Hadoop configuration using the following strategy:

Test if YARN_CONF_DIR, HADOOP_CONF_DIR or HADOOP_CONF_PATH are set (in that order). If one of these variables is set, it is used to read the configuration.
If the above strategy fails (this should not be the case in a correct YARN setup), the client is using the
HADOOP_HOME environment variable. If it is set, the client tries to access HADOOP_HOME/etc/hadoop (Hadoop 2) and HADOOP_HOME/conf (Hadoop 1).

When starting a new Flink YARN session, the client first checks if the requested resources (memory and vcores for the ApplicationMaster) are available. After that, it uploads a jar that contains Flink and the configuration to HDFS (step 1).

The next step of the client is to request (step 2) a YARN container to start the ApplicationMaster (step 3). Since the client registered the configuration and jar-file as a resource for the container, the NodeManager of YARN running on that particular machine will take care of preparing the container (e.g. downloading the files). Once that has finished, the ApplicationMaster (AM) is started.

The JobManager and AM are running in the same container. Once they successfully started, the AM knows the address of the JobManager (its own host). It is generating a new Flink configuration file for the TaskManagers (so that they can connect to the JobManager). The file is also uploaded to HDFS. Additionally, the AM container is also serving Flink’s web interface. All ports the YARN code is allocating are ephemeral ports. This allows users to execute multiple Flink YARN sessions in parallel.

After that, the AM starts allocating the containers for Flink’s TaskManagers, which will download the jar file and the modified configuration from the HDFS. Once these steps are completed, Flink is set up and ready to accept Jobs.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Flink on yarn flink on yarn flink standalone和yarn如何選擇 Flink on yarn的配置及執行 Flink on yarn HA 配置 flink on yarn啟動失敗 FLINK-啟動命令1（flink on yarn） Flink On Yarn中文亂碼解決 Flink on Yarn原理剖析及實踐 Flink on yarn的問題：Invalid AMRMToken