Hive使用Tez作為計算執行引擎的參數配置及優化


1 Tez簡介

2 Tez下載與安裝

2.1 下載

下載地址:https://tez.apache.org/releases/index.html

筆者下載示例版本:Apache TEZ® 0.9.2 (Jul 01, 2021)

下載示例:wget  'https://dlcdn.apache.org/tez/tez-0.9.2/apache-tez-tez-0.9.2-bin.tar.gz' --no-check-certificate

2.2 安裝

參考1:

2.2.1 確定Tez安裝路徑

安裝路徑:/home/Hadoop/Tez/

2.2.2 Tez解壓

Step1:cp  apache-tez-0.9.2-bin.tar.gz   /home/Hadoop/Tez/

Step2:tar -zxvf apache-tez-0.9.2-bin.tar.gz

Step3:mv  apache-tez-0.9.2-bin  tez-0.9.2

Step4:cd ./tez-0.9.2/share/

Step5:在“./tez-0.9.2/share/”路徑下找到文件:“tez.tar.gz

2.2.3 HDFS創建Tez目錄

Step1:hadoop fs -mkdir /user/tez-0.9.2

Step2:hdfs dfs -put  ./tez.tar.gz  /user/tez

Step3:hadoop fs -ls /user/tez

3 Tez環境及參數配置

需要配置的參數文件有:tez-site.xmlhadoop_env.shmapred-site.xml等。

3.1 tez-site.xml參數配置

3.1.1 創建tez-site.xml

在Hadoop中配置,即在Hadoop的master節點上的$HADOOP_HOME/etc/hadoop/目錄下創建tez-site.xml。

Step1:cd  ~/Hadoop/hadoop-3.3.1/etc/hadoop/

Step2:vi  tez-site.xml 

Step3:配置參數如下:

  <property>
    <name>tez.lib.uris</name>
    <value>/user/tez/tez.tar.gz</value>
  </property>
  <property>
    <name>tez.use.cluster.hadoop-libs</name>
    <value>true</value>
  </property>
  <property>
     <name>tez.am.resource.memory.mb</name>
     <value>1024</value>
  </property>
  <property>
     <name>tez.am.resource.cpu.vcores</name>
     <value>1</value>
  </property>

  <property>
     <name>tez.container.max.java.heap.fraction</name>
     <value>0.4</value>
  </property>
  <property>
     <name>tez.task.resource.memory.mb</name>
     <value>1024</value>
  </property>
  <property>
     <name>tez.task.resource.cpu.vcores</name>
     <value>1</value>
  </property>

  <property>
     <name>tez.runtime.compress</name>
     <value>true</value>
  </property>
  <property>
     <name>tez.runtime.compress.codec</name>
     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>

tez-size.xml詳細參數配置見:

http://tez.apache.org/releases/0.9.2/tez-api-javadocs/configs/TezConfiguration.html

tez運行詳細參數配置見:

http://tez.apache.org/releases/0.9.2/tez-runtime-library-javadocs/configs/TezRuntimeConfiguration.html

3.2 hadoop_env.sh參數配置

把Tez加入到環境變量(把Tez中所有jar包添加到HADOOP_CLASSPATH)。

在hadoop_env.sh文件的末尾,添加如下內容:

TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop/tez-site.xml
TEZ_JARS=/home/Hadoop/Tez/tez-0.9.2
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

查看Hadoop環境變量的命令為:hdfs classpath

3.3 mapred-site.xml參數配置

mapred-site.xml需要修改的參數是:mapreduce.framework.name

修改前,該參數的參數值為:

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

修改后,該參數值如下所示:

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
</property>

3.4 參數文件集群節點同步

需要將修改后的參數文件拷貝到集群上的其他節點上所對應的軟件安裝位置。

須拷貝的參數文件:tez-site.xmlhadoop_env.shmapred-site.xml

以向節點slave1拷貝文件為示例(其他節點,操作類推):

scp tez-site.xml Hadoop@slave1:~/Hadoop/hadoop-3.3.1/etc/hadoop/

scp hadoop_env.sh Hadoop@slave1:~/Hadoop/hadoop-3.3.1/etc/hadoop/

scp mapred-site.xml Hadoop@slave1:~/Hadoop/hadoop-3.3.1/etc/hadoop/

3.5 Hadoop集群重啟

由於Hadoop增加了參數配置文件tez-site.xml和修改了 hadoop_env.sh、mapred-site.xml參數文件。需要重新啟動Hadoop集群。

注:重啟Hadoop集群,建議先有序退出各種正在運行的各種正在執行程序和各種應用。

(1)關閉Hadoop

1)暫停對外服務;

2)暫停或終止數據的上傳和下載;

3)終止或暫停正在運行的計算程序;

4)陸續退出Hive、Spark、HBase和Hadoop

(2)重啟Hadoop

1)先啟動Hadoop集群

2)啟動HBase

3)啟動Hive、Spark等其他應用

4)開啟對外服務

4 Hive使用Tez作為計算引擎參數配置

4.1 參數配置

4.1.1 Hive Cli中設置tez為計算引擎——臨時設置

Hive臨時設置參數,是指在hive命令行中直接指定設置,使用如下語句即可:

set hive.execution.engine=tez;

4.1.2 Hive配置文件hive-site.xml配置tez為計算引擎——固定設置

固定設置,即hive運行環境默認使用的計算引擎,須在參數文件(hive-site.xml)中配置,示例如下:

  <property> 
    <name>hive.execution.engine</name>
    <value>mr</value>
    <description>
      Expects one of [mr, tez, spark]. 
      Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
      remains the default engine for historical reasons, it is itself a historical engine
      and is deprecated in Hive 2 line. It may be removed without further warning.
    </description>
  </property>

4.2 Hive使用mr與tez性能比較

(1)查看hive表中的數據

hive (test)> select * from t1;
OK
id    name
1    aaa
2    bbb
3    ccc
Time taken: 2.54 seconds, Fetched: 3 row(s)

(2)使用mr進行計數

hive (test)> select count(*) from t1;
Query ID = grid_20211113222059_242412fd-0f78-4a5f-a89e-cae75f3c4718
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Cannot run job locally: Number of Input Files (= 6) is larger than hive.exec.mode.local.auto.input.files.max(= 4)
Starting Job = job_1636812530170_0002, Tracking URL = http://master:8088/proxy/application_1636812530170_0002/
Kill Command = /home/Hadoop/Hadoop/hadoop-3.3.1/bin/mapred job  -kill job_1636812530170_0002
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2021-11-13 22:23:38,715 Stage-1 map = 0%,  reduce = 0%
2021-11-13 22:24:31,818 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 2.02 sec
2021-11-13 22:25:14,411 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 3.52 sec
2021-11-13 22:25:28,757 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.09 sec
2021-11-13 22:25:36,925 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.69 sec
MapReduce Total cumulative CPU time: 6 seconds 690 msec
Ended Job = job_1636812530170_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 6.69 sec   HDFS Read: 27167 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 690 msec
OK
_c0
3
Time taken: 283.315 seconds, Fetched: 1 row(s)

(3)使用tez進行計數

hive (test)> set hive.execution.engine=tez;
hive (test)> select * from t1;
OK
id    name
1    aaa
2    bbb
3    ccc
Time taken: 0.878 seconds, Fetched: 3 row(s)
hive (test)> select count(*) from t1;
OK
_c0
3
Time taken: 0.707 seconds, Fetched: 1 row(s)

(3)對比

執行語句 mr tez
select * from t1; 2.54s 0.878s
select count(*) from t1; 283.315s 0.707s

從上面示例執行語句運行的直接結果可知:使用tez性能顯著優於mr

注:筆者沒有進行更廣泛的測試,尤其沒有測試大表的執行性能。

5 Tez配置tez-ui

5.1 tez-ui簡介

 

5.2 部署tomcat

5.2.1 下載tomcat

下載地址:https://tomcat.apache.org/

筆者下載:https://dlcdn.apache.org/tomcat/tomcat-8/v8.5.72/bin/apache-tomcat-8.5.72.zip

5.2.2 安裝tomcat

Step1:unzip apache-tomcat-8.5.72.zip

Step2:mv apache-tomcat-8.5.72 /opt/tomcat-8.5.72

5.2.3 配置tomcat

Step1:cd tomcat-8.5.72/conf/

Step2:vi tomcat-users.xml

Step3:在tomcat-users.xml中如下位置添加藍色字體的內容。

  <role rolename="manager-gui"/>
  <role rolename="admin-gui"/>
  <user username="admin" password="admin" roles="manager-gui,admin-gui"/>
</tomcat-users>

5.2.4 修改端口

Step1:cd /opt/tomcat-8.5.72/conf  

Step2:vi server.xml

Step3:在下面位置將默認端口“8080”修改為“8822”。

    <Connector port="8822" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

Step5:打開防火牆端口

鍵入命令:firewall-cmd --add-port=8822/tcp --permanent

5.2.5 啟動與關閉tomcat

(1)啟動tomcat

Step1:cd /opt/tomcat-8.5.72/bin

Step2:chmod 775 *.sh

Step3:./startup.sh

[root@master bin]# ./startup.sh 
Using CATALINA_BASE:   /opt/tomcat-8.5.72
Using CATALINA_HOME:   /opt/tomcat-8.5.72
Using CATALINA_TMPDIR: /opt/tomcat-8.5.72/temp
Using JRE_HOME:        /usr
Using CLASSPATH:       /opt/tomcat-8.5.72/bin/bootstrap.jar:/opt/tomcat-8.5.72/bin/tomcat-juli.jar
Using CATALINA_OPTS:   
Tomcat started.

Step4:查看端口8822

鍵入命令:netstat -nlpt |grep 8822

(2)關閉tomcat

Step1:cd /opt/tomcat-8.5.72/bin

Step3:./shutdown.sh

5.2.6 驗證tomcat開啟情況

注:須在防火牆開啟上述端口8822,同時啟動tomcat:./startup.sh

在瀏覽器中訪問,鍵入:localhost:8822,局域網中換成固定IP即可,出現下面頁面則說明成功安裝和啟動了tomcat

5.3 tez-ui部署在tomcat上

 

 

5.4配置timelineserver

 

5.3.1配置yarn-site.xml

 

5.3.2配置tez-site.xml

 

5.4 啟用timelineserver

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM