spark運行方式及其常用參數

本文轉載自查看原文 2019-05-29 15:51 839 spark

yarn cluster模式

例行任務一般會采用這種方式運行

指定固定的executor數

作業常用的參數都在其中指定了，后面的運行腳本會省略

spark-submit \
    --master yarn-cluster \  
    --deploy-mode cluster \                  #集群運行模式
    --name wordcount_${date} \               #作業名
    --queue production.group.yanghao \       #指定隊列
    --conf spark.default.parallelism=1000 \  #並行度，shuffle后的默認partition數 
    --conf spark.network.timeout=1800s \
    --conf spark.yarn.executor.memoryOverhead=1024 \   #堆外內存
    --conf spark.scheduler.executorTaskBlacklistTime=30000 \
    --conf spark.core.connection.ack.wait.timeout=300s \
    --num-executors 200 \                   #executor數目 
    --executor-memory 4G \                  #executor中堆的內存
    --executor-cores 2 \                    #executor執行core的數目，設置大於1   
    --driver-memory 2G \                    #driver內存，不用過大   
    --class ${main_class} \                 #主類
    ${jar_path} \                           #jar包位置
    param_list \                            #mainClass接收的參數列表

動態調整executor數目

spark-submit \
    --master yarn-cluster \
    --deploy-mode cluster \
    --name wordcount_${date} \
    --queue production.group.yanghao \
    --conf spark.dynamicAllocation.enabled=true \     #開啟動態分配
    --conf spark.shuffle.service.enabled=true \       #shuffle service，可以保證executor被刪除時，shuffle file被保留
    --conf spark.dynamicAllocation.minExecutors=200 \ #最小的executor數目
    --conf spark.dynamicAllocation.maxExecutors=500 \ #最大的executor數目
    --class ${main_class} \
    ${jar_path} \
    param_list

yarn client模式

spark-shell \
    --master yarn-client \    
    --queue production.group.yanghao \      #指定隊列
    --num-executors 200 \                   #executor數目 
    --executor-memory 4G \                  #executor中堆的內存
    --executor-cores 2 \                    #executor執行core的數目，設置大於1   
    --driver-memory 2G \                    #driver內存，不用過大   
    --jars ${jar_path}                      #jar包位置

yarn cluster模式 vs yarn client模式

yarn cluster模式：spark driver和application master在同一個節點上
yarn client模式：spark driver和client在同一個節點上，支持shell

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Spark在MaxCompute的運行方式「Spark」Spark SQL Thrift Server運行方式 php中常用的4種運行方式 java -jar參數運行方式設置classpath Hive（七）Hive參數操作和運行方式 pytest運行方式 PHP運行方式對比十、RF運行方式pybot運行方式 Python 3種運行方式 pig的各種運行模式與運行方式詳解