Hive使用Tez作为计算执行引擎的参数配置及优化


1 Tez简介

2 Tez下载与安装

2.1 下载

下载地址:https://tez.apache.org/releases/index.html

笔者下载示例版本:Apache TEZ® 0.9.2 (Jul 01, 2021)

下载示例:wget  'https://dlcdn.apache.org/tez/tez-0.9.2/apache-tez-tez-0.9.2-bin.tar.gz' --no-check-certificate

2.2 安装

参考1:

2.2.1 确定Tez安装路径

安装路径:/home/Hadoop/Tez/

2.2.2 Tez解压

Step1:cp  apache-tez-0.9.2-bin.tar.gz   /home/Hadoop/Tez/

Step2:tar -zxvf apache-tez-0.9.2-bin.tar.gz

Step3:mv  apache-tez-0.9.2-bin  tez-0.9.2

Step4:cd ./tez-0.9.2/share/

Step5:在“./tez-0.9.2/share/”路径下找到文件:“tez.tar.gz

2.2.3 HDFS创建Tez目录

Step1:hadoop fs -mkdir /user/tez-0.9.2

Step2:hdfs dfs -put  ./tez.tar.gz  /user/tez

Step3:hadoop fs -ls /user/tez

3 Tez环境及参数配置

需要配置的参数文件有:tez-site.xmlhadoop_env.shmapred-site.xml等。

3.1 tez-site.xml参数配置

3.1.1 创建tez-site.xml

在Hadoop中配置,即在Hadoop的master节点上的$HADOOP_HOME/etc/hadoop/目录下创建tez-site.xml。

Step1:cd  ~/Hadoop/hadoop-3.3.1/etc/hadoop/

Step2:vi  tez-site.xml 

Step3:配置参数如下:

  <property>
    <name>tez.lib.uris</name>
    <value>/user/tez/tez.tar.gz</value>
  </property>
  <property>
    <name>tez.use.cluster.hadoop-libs</name>
    <value>true</value>
  </property>
  <property>
     <name>tez.am.resource.memory.mb</name>
     <value>1024</value>
  </property>
  <property>
     <name>tez.am.resource.cpu.vcores</name>
     <value>1</value>
  </property>

  <property>
     <name>tez.container.max.java.heap.fraction</name>
     <value>0.4</value>
  </property>
  <property>
     <name>tez.task.resource.memory.mb</name>
     <value>1024</value>
  </property>
  <property>
     <name>tez.task.resource.cpu.vcores</name>
     <value>1</value>
  </property>

  <property>
     <name>tez.runtime.compress</name>
     <value>true</value>
  </property>
  <property>
     <name>tez.runtime.compress.codec</name>
     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>

tez-size.xml详细参数配置见:

http://tez.apache.org/releases/0.9.2/tez-api-javadocs/configs/TezConfiguration.html

tez运行详细参数配置见:

http://tez.apache.org/releases/0.9.2/tez-runtime-library-javadocs/configs/TezRuntimeConfiguration.html

3.2 hadoop_env.sh参数配置

把Tez加入到环境变量(把Tez中所有jar包添加到HADOOP_CLASSPATH)。

在hadoop_env.sh文件的末尾,添加如下内容:

TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop/tez-site.xml
TEZ_JARS=/home/Hadoop/Tez/tez-0.9.2
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

查看Hadoop环境变量的命令为:hdfs classpath

3.3 mapred-site.xml参数配置

mapred-site.xml需要修改的参数是:mapreduce.framework.name

修改前,该参数的参数值为:

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

修改后,该参数值如下所示:

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
</property>

3.4 参数文件集群节点同步

需要将修改后的参数文件拷贝到集群上的其他节点上所对应的软件安装位置。

须拷贝的参数文件:tez-site.xmlhadoop_env.shmapred-site.xml

以向节点slave1拷贝文件为示例(其他节点,操作类推):

scp tez-site.xml Hadoop@slave1:~/Hadoop/hadoop-3.3.1/etc/hadoop/

scp hadoop_env.sh Hadoop@slave1:~/Hadoop/hadoop-3.3.1/etc/hadoop/

scp mapred-site.xml Hadoop@slave1:~/Hadoop/hadoop-3.3.1/etc/hadoop/

3.5 Hadoop集群重启

由于Hadoop增加了参数配置文件tez-site.xml和修改了 hadoop_env.sh、mapred-site.xml参数文件。需要重新启动Hadoop集群。

注:重启Hadoop集群,建议先有序退出各种正在运行的各种正在执行程序和各种应用。

(1)关闭Hadoop

1)暂停对外服务;

2)暂停或终止数据的上传和下载;

3)终止或暂停正在运行的计算程序;

4)陆续退出Hive、Spark、HBase和Hadoop

(2)重启Hadoop

1)先启动Hadoop集群

2)启动HBase

3)启动Hive、Spark等其他应用

4)开启对外服务

4 Hive使用Tez作为计算引擎参数配置

4.1 参数配置

4.1.1 Hive Cli中设置tez为计算引擎——临时设置

Hive临时设置参数,是指在hive命令行中直接指定设置,使用如下语句即可:

set hive.execution.engine=tez;

4.1.2 Hive配置文件hive-site.xml配置tez为计算引擎——固定设置

固定设置,即hive运行环境默认使用的计算引擎,须在参数文件(hive-site.xml)中配置,示例如下:

  <property> 
    <name>hive.execution.engine</name>
    <value>mr</value>
    <description>
      Expects one of [mr, tez, spark]. 
      Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
      remains the default engine for historical reasons, it is itself a historical engine
      and is deprecated in Hive 2 line. It may be removed without further warning.
    </description>
  </property>

4.2 Hive使用mr与tez性能比较

(1)查看hive表中的数据

hive (test)> select * from t1;
OK
id    name
1    aaa
2    bbb
3    ccc
Time taken: 2.54 seconds, Fetched: 3 row(s)

(2)使用mr进行计数

hive (test)> select count(*) from t1;
Query ID = grid_20211113222059_242412fd-0f78-4a5f-a89e-cae75f3c4718
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Cannot run job locally: Number of Input Files (= 6) is larger than hive.exec.mode.local.auto.input.files.max(= 4)
Starting Job = job_1636812530170_0002, Tracking URL = http://master:8088/proxy/application_1636812530170_0002/
Kill Command = /home/Hadoop/Hadoop/hadoop-3.3.1/bin/mapred job  -kill job_1636812530170_0002
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2021-11-13 22:23:38,715 Stage-1 map = 0%,  reduce = 0%
2021-11-13 22:24:31,818 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 2.02 sec
2021-11-13 22:25:14,411 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 3.52 sec
2021-11-13 22:25:28,757 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.09 sec
2021-11-13 22:25:36,925 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.69 sec
MapReduce Total cumulative CPU time: 6 seconds 690 msec
Ended Job = job_1636812530170_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 6.69 sec   HDFS Read: 27167 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 690 msec
OK
_c0
3
Time taken: 283.315 seconds, Fetched: 1 row(s)

(3)使用tez进行计数

hive (test)> set hive.execution.engine=tez;
hive (test)> select * from t1;
OK
id    name
1    aaa
2    bbb
3    ccc
Time taken: 0.878 seconds, Fetched: 3 row(s)
hive (test)> select count(*) from t1;
OK
_c0
3
Time taken: 0.707 seconds, Fetched: 1 row(s)

(3)对比

执行语句 mr tez
select * from t1; 2.54s 0.878s
select count(*) from t1; 283.315s 0.707s

从上面示例执行语句运行的直接结果可知:使用tez性能显著优于mr

注:笔者没有进行更广泛的测试,尤其没有测试大表的执行性能。

5 Tez配置tez-ui

5.1 tez-ui简介

 

5.2 部署tomcat

5.2.1 下载tomcat

下载地址:https://tomcat.apache.org/

笔者下载:https://dlcdn.apache.org/tomcat/tomcat-8/v8.5.72/bin/apache-tomcat-8.5.72.zip

5.2.2 安装tomcat

Step1:unzip apache-tomcat-8.5.72.zip

Step2:mv apache-tomcat-8.5.72 /opt/tomcat-8.5.72

5.2.3 配置tomcat

Step1:cd tomcat-8.5.72/conf/

Step2:vi tomcat-users.xml

Step3:在tomcat-users.xml中如下位置添加蓝色字体的内容。

  <role rolename="manager-gui"/>
  <role rolename="admin-gui"/>
  <user username="admin" password="admin" roles="manager-gui,admin-gui"/>
</tomcat-users>

5.2.4 修改端口

Step1:cd /opt/tomcat-8.5.72/conf  

Step2:vi server.xml

Step3:在下面位置将默认端口“8080”修改为“8822”。

    <Connector port="8822" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

Step5:打开防火墙端口

键入命令:firewall-cmd --add-port=8822/tcp --permanent

5.2.5 启动与关闭tomcat

(1)启动tomcat

Step1:cd /opt/tomcat-8.5.72/bin

Step2:chmod 775 *.sh

Step3:./startup.sh

[root@master bin]# ./startup.sh 
Using CATALINA_BASE:   /opt/tomcat-8.5.72
Using CATALINA_HOME:   /opt/tomcat-8.5.72
Using CATALINA_TMPDIR: /opt/tomcat-8.5.72/temp
Using JRE_HOME:        /usr
Using CLASSPATH:       /opt/tomcat-8.5.72/bin/bootstrap.jar:/opt/tomcat-8.5.72/bin/tomcat-juli.jar
Using CATALINA_OPTS:   
Tomcat started.

Step4:查看端口8822

键入命令:netstat -nlpt |grep 8822

(2)关闭tomcat

Step1:cd /opt/tomcat-8.5.72/bin

Step3:./shutdown.sh

5.2.6 验证tomcat开启情况

注:须在防火墙开启上述端口8822,同时启动tomcat:./startup.sh

在浏览器中访问,键入:localhost:8822,局域网中换成固定IP即可,出现下面页面则说明成功安装和启动了tomcat

5.3 tez-ui部署在tomcat上

 

 

5.4配置timelineserver

 

5.3.1配置yarn-site.xml

 

5.3.2配置tez-site.xml

 

5.4 启用timelineserver

 


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM