hive on tez


 

hive運行模式

  1. hive on mapreduce 離線計算(默認)
  2. hive on tez   YARN之上支持DAG作業的計算框架
  3. hive on spark 內存計算

hive on tez

Tez是一個構建於YARN之上的支持復雜的DAG任務的數據處理框架。它由Hontonworks開源,它把mapreduce的過程拆分成若干個子過程,同時可以把多個mapreduce任務組合成一個較大的DAG任務,減少了mapreduce之間的文件存儲,同時合理組合其子過程從而大幅提升MapReduce作業的性能。

安裝tez

tez的安裝有源碼安裝和二進制包安裝,這里使用二進制包安裝。

hadoop版本:2.9.1

hive版本:2.1.1

tez版本:0.9.0

前提:hadoop環境已經搭建好,包括yarn(tez需要運行在yarn上)、hive

下載

wget http://mirror.bit.edu.cn/apache/tez/0.9.0/apache-tez-0.9.0-bin.tar.gz

安裝

# tar zxvf apache-tez-0.9.0-bin.tar.gz
# mv apache-tez-0.9.0-bin/ tez-0.9.0
# hdfs dfs -mkdir -p /tez-0.9.0
# cd /tez-0.9.0/
# hdfs dfs -put share/tez.tar.gz /tez-0.9.0

配置tez

# cd /data1/hadoop/hadoop/etc/hadoop/
# cat tez-site.xml

<property>
<name>tez.lib.uris</name>
<value>hdfs://MY-HADOOP/tez-0.9.0/tez.tar.gz</value>    #指定在hdfs上的tez包文件
</property>
<property>
<name>tez.dag.recovery.enabled</name>        #重啟DAG,默認true
<value>true</value>
</property>
<property>
<name>tez.dag.recovery.io.buffer.size</name>   #恢復DAG時,使用的緩存大小,默認8192
<value>4096</value>
</property>
<property>
<name>tez.dag.recovery.flush.interval.secs</name>  #刷新間隔時間,默認30s。
<value>60</value>
</property>
<property>
<name>tez.dag.recovery.max.unflushed.events</name>   #刷新到磁盤前最大緩存的事件數量,默認100
<value>100</value>
</property>
<property>
<name>tez.task.heartbeat.timeout.check-ms</name>      #心跳間隔時間,默認30s
<value>10000</value>
</property>
<property>
<name>tez.task.timeout-ms</name>   #任務超時時間,默認300s
<value>300000</value>
</property>
<property>
<name>tez.am.acls.enabled</name>   #ACL權限控制,默認true
<value>true</value>
</property>
<property>
<name>tez.am.client.am.thread-count</name>   #處理客戶端請求的線程數量,默認2
<value>50</value>
</property>
<property>
<name>tez.am.containerlauncher.thread-count-limit</name>    #啟動容器上線,默認500
<value>500</value>
</property>
<property>
<name>tez.am.dag.scheduler.class</name>
<value>org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder</value>
</property>
<property>
<name>tez.am.deletion.tracker.class</name>
<value>org.apache.tez.dag.app.launcher.DeletionTrackerImpl</value>
</property>
<property>
<name>tez.am.launch.cmd-opts</name>
<value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC</value>
</property>
<property>
<name>tez.am.resource.cpu.vcores</name>    #app master 使用的虛擬CPU數量,默認1
<value>2</value>
</property>
<property>
<name>tez.am.resource.memory.mb</name>   #app master使用的內存,默認1024
<value>2048</value>
</property>
<property>
<name>tez.am.am-rm.heartbeat.interval-ms.max</name>   #AM與RM心跳間隔時間,默認是3s
<value>3000</value>
</property>
<property>
<name>tez.am.shuffle.auxiliary-service.id</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>tez.am.task.listener.thread-count</name> # 用於偵聽任務心跳請求的線程數,默認30
<value>60</value>
</property>
<property>
<name>tez.am.tez-ui.webservice.enable</name>   #用於開啟Tez UI的
<value>true</value>
</property>
<property>
<name>tez.client.timeout-ms</name>
<value>30000</value>
</property>
<property>
<name>tez.task.resource.cpu.vcores</name>     #任務使用的CPU數量。默認1
<value>1</value>
</property>
<property>
<name>tez.task.resource.memory.mb</name> #任務使用的內存,默認1024
<value>2048</value>
</property>
<property>
<name>tez.container.max.java.heap.fraction</name>   #容器在jvm中占用的比例,默認0.8,如果內存不足,建議調小該值。
<value>0.2</value>
</property>
</configuration>

參考:/tez-0.9.0/conf/tez-default-template.xml

環境變量配置(~/.bashrc)

添加如下配置
export TEZ_CONF_DIR=$HADOOP_CONF_DIR

export TEZ_JARS=/tez-0.9.0/*:/tez-0.9.0/lib/*

export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH

執行"source ~/.bashrc"讓環境變量生效。

hadoop版本兼容問題

[root@hadoop01 ~]# cd /tez-0.9.0/lib

[root@hadoop01 lib]# rm -rf hadoop-mapreduce-client-core-2.7.0.jar hadoop-mapreduce-client-common-2.7.0.jar

 

[root@hadoop01 lib]# cp /data1/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.9.1.jar /tez-0.9.0/lib/

[root@hadoop01 lib]# cp /data1/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.9.1.jar /tez-0.9.0/lib/

啟動hive

#hive
hive> SET hive.execution.engine=tez; 設置執行引擎為tez,默認是MapReduce
或者修改hive的配置文件hive-site.xml,添加如下配置:

<property>
<name>hive.user.install.directory</name>
<value>/user/</value>
</property>
<property>
<name>hive.execution.engine</name>    #配置成默認使用tez
<value>tez</value>
</property>

測試數據

創建表
hive> create table user_info(user_id bigint, firstname string, lastname string, count string);
插入數據
hive> insert into user_info values(1,'dennis','hu','CN'),(2,'Json','Lv','Jpn'),(3,'Mike','Lu','USA');

Query ID = root_20190618043047_bfc41253-60f9-469d-b6a9-c26c93a92e82
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.


Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.55 s
----------------------------------------------------------------------------------------------
Loading data to table default.user_info
OK
Time taken: 9.488 seconds

查詢

> select count(1) from user_info;
Query ID = root_20190618043342_5f83efb4-39bf-4d67-bac4-d67205086ae7
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 4.46 s
----------------------------------------------------------------------------------------------
OK
9
Time taken: 4.979 seconds, Fetched: 1 row(s)
hive> select count(1) from user_info;
Query ID = root_20190618043349_ecee5657-7c95-43ab-80e9-101dd36d6fc7
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 0.72 s
----------------------------------------------------------------------------------------------
OK
9
Time taken: 1.156 seconds, Fetched: 1 row(s)

yarn web界面查看

 

 由此可看出,引擎類型變成TEZ。

 

 配置tez-ui

修改tez-site.xml文件

添加如下:

<property>
   <name>tez.history.logging.service.class</name>
   <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value> 
 </property> 

 <property> 
  <description>URL for where the Tez UI is hosted</description> 
  <name>tez.tez-ui.history-url.base</name> 
  <value>http://master:9999/tez-ui/</value>     #啟動tez-ui的地址
 </property>

<property>
<name>tez.allow.disabled.timeline-domains</name>
<value>true</value>
</property>

<property> 
    <name>tez.runtime.convert.user-payload.to.history-text</name> 
    <value>true</value> 
</property> 
<property> 
    <name>tez.task.generate.counters.per.io</name> 
    <value>true</value> 
</property> 

修改yarn-site.xml文件

添加如下:

<property>
    <name>yarn.timeline-service.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.timeline-service.hostname</name>
    <value>master</value>
</property>
<property>
    <name>yarn.timeline-service.http-cross-origin.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.timeline-service.generic-application-history.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
    <value>true</value>
</property>
<property>
  <name>yarn.timeline-service.address</name>
  <value>${yarn.timeline-service.hostname}:10200</value>
</property>

<property>
  <name>yarn.timeline-service.webapp.address</name>
  <value>${yarn.timeline-service.hostname}:8188</value>
</property>

<property>
  <name>yarn.timeline-service.webapp.https.address</name>
  <value>${yarn.timeline-service.hostname}:8190</value>
</property>

<property>
  <description>Handler thread count to serve the client RPC requests.</description>
  <name>yarn.timeline-service.handler-thread-count</name>
  <value>10</value>
</property>
<property>
  <name>yarn.timeline-service.generic-application-history.enabled</name>
  <value>false</value>
</property>

<property>
  <name>yarn.timeline-service.generic-application-history.store-class</name>
  <value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
</property>

拷貝文件

拷貝tez-site.xml和yarn-site.xml文件到其他機器

安裝tomcat

下載地址:https://tomcat.apache.org/download-80.cgi

1、把Tomcat目錄下的webapps里的文件刪除,再把把上面的tez-0.9.0 下的tez-ui2-0.9.0.war 復制到webapps目錄里

#mkdir /data1/apache-tomcat-8.5.42/webapps/tez-ui
# cd /data1/apache-tomcat-8.5.42/webapps/tez-ui
#cp /tez-0.9.0/tez-ui-0.9.0.war /data1/apache-tomcat-8.5.42/webapps/tez-ui/tez-ui.war
#unzip tez-ui.war #解壓
#修改當前目錄下config/configs.env文件

 

把localhost改成主機名或者ip,同時把前面的//去掉

 

2、修改Tomcat的配置文件:service.xml 修改8080端口為9999,也是跟上面的配置一樣;

3、由於上面修改過了配置,所以要重新啟動HDFS集群和Hive程序;而且還要啟動一個叫:timelineserver服務;

./stop-all.sh     #停止HDFS集群
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver
./yarn-daemon.sh start timelineserver       #必須要先啟動HDFS集群后才可以啟動起來


root@master:/data1/apache-tomcat-8.5.42/webapps# jps
101719 Bootstrap     #tomcat服務進程
2551 HMaster
99878 DFSZKFailoverController
102662 JobHistoryServer
94729 RunJar
103561 Jps
94392 RunJar
99275 NameNode
99610 JournalNode
588 QuorumPeerMain
102271 ResourceManager
102798 ApplicationHistoryServer    #這個就是timelineserver服務

啟動hive

nohup hive --service metastore &
nohup hive --service hiveserver2 &

啟動完成以后,訪問主機的8088端口和8188端口,如果8088端口訪問界面同8088端口要用,就ok,如下:

訪問8088端口:

 

 訪問8188端口:

 

接着去訪問主機的9999端口,如下:

瀏覽器輸入:192.168.4.46:9999/tez-ui

尷尬的不是,報錯了,界面一直處於loading,網頁下面提示的信息如下:

 分析上面日志里面的url:主機名或者ip為什么是localhost,,我在本地瀏覽器訪問服務器,如果這里是localhost,那肯定不對啊。

我的解決方案:

修改tez-ui的端口,在tez-site.xml文件里面,把下面配置修改成8081端口

 <property> 
  <description>URL for where the Tez UI is hosted</description> 
  <name>tez.tez-ui.history-url.base</name> 
  <value>http://master:8081/tez-ui/</value>     #啟動tez-ui的地址,原先是9999端口
 </property>

 同時,把tomcat下面的service.xml配置文件的端口改成8081,如下:

 

 接着重新啟動yarn和tomcat。

 

存在的問題:不清楚為什么端口會導致上訴的問題。

借鑒:https://blog.csdn.net/duguyiren3476/article/details/46349177

借鑒:https://blog.csdn.net/gobitan/article/details/85109644

借鑒:http://tez.apache.org/install.html (官網)

借鑒:https://www.58jb.com/html/114.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM