hive 調優(三)tez優化


我們采用亞馬遜emr構建的集群,用hive查詢的時候報錯,FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask,查看了下面的參數,挺有幫助的 我是設置了這個參數set hive.tez.auto.reducer.parallelism=true;

Tez內存優化

1、AM、Container大小設置

tez.am.resource.memory.mb

參數說明:Set tez.am.resource.memory.mb tobe the same as yarn.scheduler.minimum-allocation-mb the YARNminimum container size.

 

hive.tez.container.size

參數說明:Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb.

2、AM、Container JVM參數設置

tez.am.launch.cmd-opts 

默認值:80%*tez.am.resource.memory.mb

參數說明:一般不需要調整

 

hive.tez.java.ops

    默認值:80%*hive.tez.container.size

       參數說明:Hortonworks建議“–server –Djava.net.preferIPv4Stack=true–XX:NewRatio=8 –XX:+UseNUMA –XX:UseG1G”

 

tez.container.max.java.heap.fraction

    默認值:0.8

       參數說明:task\AM占用JVM Xmx的比例,該參數建議調整,需根據具體業務情況修改;

3、Hive內存Map Join參數設置

tez.runtime.io.sort.mb

默認值:100

參數說明:輸出排序需要的內存大小。建議值:40%*hive.tez.container.size,一般不超過2G;

 

hive.auto.convert.join.noconditionaltask

默認值:true

參數說明:是否將多個mapjoin合並為一個,使用默認值

 

hive.auto.convert.join.noconditionaltask.size

默認值:

參數說明:多個mapjoin轉換為1個時,所有小表的文件大小總和的最大值,這個值只是限制輸入的表文件的大小,並不代表實際mapjoin時hashtable的大小。 建議值:1/3* hive.tez.container.size

 

tez.runtime.unordered.output.buffer.size-mb

默認值:100

參數說明:Size of the buffer to use if not writing directly to disk.。 建議值:10%* hive.tez.container.size

4、Container重用設置

tez.am.container.reuse.enabled

    默認值:true

    參數說明:Container重用開關

Mapper/Reducer優化

1、Mapper數設置

tez.grouping.min-size

默認值:50*1024*1024

參數說明:Lower bound on thesize (in bytes) of a grouped split, to avoid generating too many small splits.

tez.grouping.max-size

默認值:1024*1024*1024

參數說明:Upper bound on thesize (in bytes) of a grouped split, to avoid generating excessively largesplits.

;

2、Reducer數設置

hive.tez.auto.reducer.parallelism

默認值:false

參數說明:Turn on Tez' autoreducer parallelism feature. When enabled, Hive will still estimate data sizesand set parallelism estimates. Tez will sample source vertices' output sizesand adjust the estimates at runtime as necessary.

建議設置為true.

hive.tex.min.partition.factor

默認值:0.25

參數說明:When auto reducerparallelism is enabled this factor will be used to put a lower limit to thenumber of reducers that Tez specifies.

hive.tez.max.partition.factor

默認值:2.0

參數說明:When auto reducerparallelism is enabled this factor will be used to over-partition data inshuffle edges.

hive.exec.reducers.bytes.per.reducer

默認值:256,000,000

參數說明:Sizeper reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if theinput size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later thedefault is 256 MB, that is, if the input size is 1 GB then 4 reducers willbe used.

 

以下公式確認Reducer個數:

Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer))x hive.tez.max.partition.factor [2]

3、Shuffle參數設置

tez.shuffle-vertex-manager.min-src-fraction

默認值:0.25

參數說明:thefraction of source tasks which should complete before tasks for the currentvertex are scheduled.

tez.shuffle-vertex-manager.max-src-fraction

默認值:0.75

參數說明:oncethis fraction of source tasks have completed, all tasks on the current vertexcan be scheduled. Number of tasks ready for scheduling on the current vertexscales linearly between min-fraction and max-fraction.

 

例子:

hive.exec.reducers.bytes.per.reducer=1073741824;// 1gb

tez.shuffle-vertex-manager.min-src-fraction=0.25;

tez.shuffle-vertex-manager.max-src-fraction=0.75;

This indicates thatthe decision will be made between 25% of mappers finishing and 75% of mappersfinishing, provided there's at least 1Gb of data being output (i.e if 25% ofmappers don't send 1Gb of data, we will wait till at least 1Gb is sent out).

騷年希望能幫助你


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM