【Hive學習之六】Hive Lateral View &視圖&索引


環境
  虛擬機:VMware 10
  Linux版本:CentOS-6.5-x86_64
  客戶端:Xshell4
  FTP:Xftp4
  jdk8
  hadoop-3.1.1
  apache-hive-3.1.1

一、Hive Lateral View
Lateral View用於和UDTF函數(explode、split)結合來使用。
首先通過UDTF函數拆分成多行,再將多行結果組合成一個支持別名的虛擬表。
主要解決在select使用UDTF做查詢過程中,查詢只能包含單個UDTF,不能包含其他字段、以及多個UDTF的問題

語法:
LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

舉例:統計人員表中共有多少種愛好、多少個城市?

hive> select * from psn2;
OK
psn2.id    psn2.name    psn2.likes    psn2.address    psn2.age
1    小明1    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
2    小明2    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
3    小明3    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
4    小明4    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
5    小明5    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
6    小明6    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
1    小明1    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
2    小明2    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
3    小明3    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
4    小明4    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
5    小明5    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
6    小明6    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
Time taken: 0.138 seconds, Fetched: 12 row(s)
hive> select explode(likes) from psn2;
OK
col
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
movie
Time taken: 0.294 seconds, Fetched: 36 row(s)
hive> select count(distinct(myCol1)), count(distinct(myCol2)) from psn2 
> LATERAL VIEW explode(likes) myTable1 AS myCol1 
> LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;
Query ID = root_20190216171853_af297af9-dcc6-4e1e-8674-fa0969727b23
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1548397153910_0012, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0012/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job -kill job_1548397153910_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-02-16 17:19:00,480 Stage-1 map = 0%, reduce = 0%
2019-02-16 17:19:04,582 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.08 sec
2019-02-16 17:19:09,693 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.24 sec
MapReduce Total cumulative CPU time: 7 seconds 240 msec
Ended Job = job_1548397153910_0012
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.24 sec HDFS Read: 15860 HDFS Write: 103 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 240 msec
OK
_c0 _c1 3    2
Time taken: 16.894 seconds, Fetched: 1 row(s)
hive>

 

二、hive View視圖
和關系型數據庫中的普通視圖一樣,hive也支持視圖
特點:
  不支持物化視圖(oracle支持)
  只能查詢,不能做加載數據操作
  視圖的創建,只是保存一份元數據,查詢視圖時才執行對應的子查詢
  view定義中若包含了ORDER BY/LIMIT語句,當查詢視圖時也進行ORDER BY/LIMIT語句操作,view當中定義的優先級更高
  view支持迭代視圖

View語法
創建視圖:

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name 
[(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ... ;

舉例:注意 視圖在HDFS下不存在文件  

hive> create view v_psn2 as select id,name from psn2;
OK
id    name
Time taken: 0.127 seconds
hive> show tables;
OK
tab_name
cell_drop_monitor
cell_monitor
docs
logtbl
person
person3
psn2
psn21
psn22
psn3
psn31
psn4
psnbucket
student
test01
v_psn2 wc
Time taken: 0.02 seconds, Fetched: 17 row(s)
hive> select * from v_psn2;
OK
v_psn2.id    v_psn2.name
1    小明1
2    小明2
3    小明3
4    小明4
5    小明5
6    小明6
1    小明1
2    小明2
3    小明3
4    小明4
5    小明5
6    小明6
Time taken: 0.11 seconds, Fetched: 12 row(s)
hive> drop view v_psn2;
OK
Time taken: 0.08 seconds
hive> select * from v_psn2;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'v_psn2'
hive> 

 

三、Hive 索引

 目的:優化查詢以及檢索性能

給表psn2創建索引:
create index t1_index on table psn2(name)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild
in table t1_index_table;

as:指定索引器;
in table:指定索引表,若不指定默認生成在default__psn2_t1_index__表中

create index t1_index on table psn2(name)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild;

查詢索引
show index on psn2;

重建索引(建立索引之后必須重建索引才能生效)
ALTER INDEX t1_index ON psn2 REBUILD;

刪除索引
DROP INDEX IF EXISTS t1_index ON psn2;

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM