spark-submit參數說明--on YARN

本文轉載自查看原文 2017-06-06 18:25 6714 Linux/ Spark

示例： spark-submit [--option value] <application jar> [application arguments]

參數名稱	含義
--master MASTER_URL	yarn
--deploy-mode DEPLOY_MODE	Driver程序運行的地方：client、cluster
--class CLASS_NAME	The FQCN of the class containing the main method of the application. For example, `org.apache.spark.examples.SparkPi`. 應用程序主類名稱，含包名
--name NAME	應用程序名稱
--jars JARS	Driver和Executor依賴的第三方jar包
--properties-file FILE	應用程序屬性的文件路徑，默認是conf/spark-defaults.conf
以下設置Driver
--driver-cores NUM	Driver程序使用的CPU核數(只用於cluster)，默認為1
--driver-memory MEM	Driver程序使用內存大小
--driver-library-path	Driver程序的庫路徑
--driver-class-path	Driver程序的類路徑
--driver-java-options
以下設置Executor
--num-executors NUM	The total number of YARN containers to allocate for this application. Alternatively, you can use the `spark.executor.instances` configuration parameter. 啟動的executor的數量，默認為2
--executor-cores NUM	Number of processor cores to allocate on each executor 每個executor使用的CPU核數，默認為1
--executor-memory MEM	The maximum heap size to allocate to each executor. Alternatively, you can use the `spark.executor.memory` configuration parameter. 每個executor內存大小，默認為1G
--queue QUEUE_NAME	The YARN queue to submit to. 提交應用程序給哪個YARN的隊列，默認是default隊列
--archives ARCHIVES
--files FILES	用逗號隔開的要放置在每個executor工作目錄的文件列表

1.部署模式概述

In YARN, each application instance has an ApplicationMaster process, which is the first container started for that application.

The application is responsible for requesting resources from the ResourceManager, and, when allocated them, instructing NodeManagers to start containers on its behalf.

ApplicationMasters obviate the need for an active client — the process starting the application can terminate and coordination continues from a process managed by YARN running on the cluster.

2.部署模式：Cluster

In cluster mode, the driver runs in the ApplicationMaster on a cluster host chosen by YARN.

This means that the same process, which runs in a YARN container, is responsible for both driving the application and requesting resources from YARN.

The client that launches the application doesn't need to continue running for the entire lifetime of the application.

Cluster mode is not well suited to using Spark interactively.

Spark applications that require user input, such as spark-shell and pyspark, need the Spark driver to run inside the client process that initiates the Spark application.

3.部署模式：Client

In client mode, the driver runs on the host where the job is submitted.

The ApplicationMaster is merely present to request executor containers from YARN.

The client communicates with those containers to schedule work after they start:

4.參考文檔：

https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_running_spark_on_yarn.html

http://spark.apache.org/docs/1.3.0/running-on-yarn.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 spark-submit配置說明 spark-submit參數詳解 spark-submit 提交Application Spark-shell和Spark-Submit的使用【原創】大叔經驗分享（14）spark on yarn提交任務到集群后spark-submit進程一直等待通過spark-submit提交hadoop配置的方法使用pyspark進行spark-submit innobackupex參數說明 dataTable 參數說明內核參數說明