spark.sql.shuffle.partitions 和 spark.default.parallelism 的區別 - 碼上歡樂

相關內容簡體繁體

spark.sql.shuffle.partitions 和 spark.default.parallelism 的區別

本文轉載自查看原文 2019-02-27 10:30 593 Spark

在關於spark任務並行度的設置中，有兩個參數我們會經常遇到，spark.sql.shuffle.partitions 和 spark.default.parallelism, 那么這兩個參數到底有什么區別的？

首先，讓我們來看下它們的定義

Property Name	Default	Meaning
spark.sql.shuffle.partitions	200	Configures the number of partitions to use when shuffling data for joins or aggregations.
spark.default.parallelism	For distributed shuffle operations like reduceByKey and join, the largest number of partitions in a parent RDD. For operations like parallelize with no parent RDDs, it depends on the cluster manager: - Local mode: number of cores on the local machine - Mesos fine grained mode: 8 - Others: total number of cores on all executor nodes or 2, whichever is larger	Default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user.

看起來它們的定義似乎也很相似，但在實際測試中，

spark.default.parallelism只有在處理RDD時才會起作用，對Spark SQL的無效。
spark.sql.shuffle.partitions則是對sparks SQL專用的設置

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 spark.sql.shuffle.partitions和spark.default.parallelism的區別 spark.sql.shuffle.partitions到底影響什么 spark通過合理設置spark.default.parallelism參數提高執行效率 spark提交命令 spark-submit 的參數 executor-memory、executor-cores、num-executors、spark.default.parallelism分析簡要MR與Spark在Shuffle區別 MR的shuffle和Spark的shuffle之間的區別 Spark Shuffle Spark Shuffle之Sort Shuffle Spark的Shuffle和MR的Shuffle異同 Spark中的Spark Shuffle詳解

粵ICP備18138465號 © 2018-2025 CODEPRJ.COM