1.explode
hive wiki對於expolde的解釋如下:
explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.
As an example of using explode() in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows:
Then running the query:
SELECT explode(myCol) AS myNewCol FROM myTable;
will produce:
The usage with Maps is similar:
SELECT explode(myMap) AS (myMapKey, myMapValue) FROM myMapTable;
總結起來一句話:explode就是將hive一行中復雜的array或者map結構拆分成多行。
使用實例:
xxx表中有一個字段mvt為string類型,數據格式如下:
[{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”},{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”},{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”},{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”},{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”},{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”},{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”},{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”}]
用explode小試牛刀一下:
select explode(split(regexp_replace(mvt,'\\[|\\]',''),'\\},\\{')) from ods_mvt_hourly where day=20160710 limit 10;
最后出來的結果如下:
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”
“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”
“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”
“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”
“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”
“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”
“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”
“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”}
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”
“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”
2.lateral view
hive wiki 上的解釋如下:
Lateral View Syntax
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)*
fromClause: FROM baseTable (lateralView)*
Description
Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.
Example
Consider the following base table named pageAds. It has two columns: pageid (name of the page) and adid_list (an array of ads appearing on the page)
An example table with two rows:
and the user would like to count the total number of times an ad appears across all pages.
A lateral view with explode() can be used to convert adid_list into separate rows using the query:
SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
The resulting output will be
Then in order to count the number of times a particular ad appears, count/group by can be used:
SELECT adid, count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY adid;
The resulting output will be
由此可見,lateral view與explode等udtf就是天生好搭檔,explode將復雜結構一行拆成多行,然后再用lateral view做各種聚合。
3.實例
還是第一部分的例子,上面我們explode出來以后的數據,不是標准的json格式,我們通過lateral view與explode組合解析出標准的json格式數據:
SELECT ecrd, CASE WHEN instr(mvtstr,'{')=0 AND instr(mvtstr,'}')=0 THEN concat('{',mvtstr,'}') WHEN instr(mvtstr,'{')=0 AND instr(mvtstr,'}')>0 THEN concat('{',mvtstr) WHEN instr(mvtstr,'}')=0 AND instr(mvtstr,'{')>0 THEN concat(mvtstr,'}') ELSE mvtstr END AS mvt FROM ods.ods_mvt_hourly LATERAL VIEW explode(split(regexp_replace(mvt,'\\[|\\]',''),'\\},\\{')) addTable AS mvtstr WHERE DAY='20160710' and ecrd is not null limit 10
查詢出來的結果:
xxx
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”}
xxx
{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}
xxx
{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”}
xxx
{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”}
xxx
{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”
xxx
{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”}
xxx
{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”}
xxx
{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”1”,”vid”:”38”,”vr”:”var1”}
xxx
{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”}
xxx
{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}
4.Ending
Lateral View通常和UDTF一起出現,為了解決UDTF不允許在select字段的問題。
Multiple Lateral View可以實現類似笛卡爾乘積。
Outer關鍵字可以把不輸出的UDTF的空結果,輸出成NULL,防止丟失數據。
參考內容:
1.http://blog.csdn.net/oopsoom/article/details/26001307 lateral view的用法實例
2.https://my.oschina.net/leejun2005/blog/120463 復合函數的用法,比較詳細
3.http://blog.csdn.net/zhaoli081223/article/details/46637517 udtf的介紹
Lateral View用法 與 Hive UDTF explode
Lateral View是Hive中提供給UDTF的conjunction,它可以解決UDTF不能添加額外的select列的問題。
1. Why we need Lateral View?
-
select game_id, explode(split(user_ids, '\\[\\[\\[')) as user_id from login_game_log where dt='2014-05-15'
-
FAILED: Error in semantic analysis: UDTF 's are not supported outside the SELECT clause, nor nested in expressions。
提示語法分析錯誤,UDTF不支持函數之外的select 語句,真無語。。。
如果我們想支持怎么辦呢?接下來就是Lateral View 登場的時候了。
2. Lateral View explain
2.1 單個Lateral View
Lateral view is used in conjunction with user-defined table generatingfunctions such as explode()
. As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows foreach input row. A lateral view first applies the UDTF to each row of base tableand then joins resulting output rows to the input rows to form a virtual tablehaving the supplied table alias.
Lateral view 其實就是用來和像類似explode這種UDTF函數聯用的。lateral view 會將UDTF生成的結果放到一個虛擬表中,然后這個虛擬表會和輸入行即每個game_id進行join 來達到連接UDTF外的select字段的目的。
Lateral View Syntax
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (
','
columnAlias)*
fromClause: FROM baseTable (lateralView)*
|
1. 在udtf前面用
2. 在from baseTable后面用
舉個例子:
1. 先創建一個文件,里面2列用\t分割,game_id和user_ids
-
hive> create table test_lateral_view_shengli(game_id string,userl_ids string) row format delimited fields terminated by '\t' stored as textfile;
-
OK
-
Time taken: 2.451 seconds
-
hive> load data local inpath '/home/hadoop/test_lateral' into table test_lateral_view_shengli;
-
Copying data from file:/home/hadoop/test_lateral
-
Copying file: file:/home/hadoop/test_lateral
-
Loading data to table dw.test_lateral_view_shengli
-
OK
-
Time taken: 6.716 seconds
-
hive> select * from test_lateral_view_shengli;
-
OK
-
game101 15358083654[[[ab33787873[[[zjy18052480603[[[shlg1881826[[[lxqab110
-
game66 winning1ren[[[ 13810537508
-
game101 hu330602003[[[hu330602004[[[hu330602005[[[ 15967506560
下面使用lateral_view
-
hive> select game_id, user_id
-
> from test_lateral_view_shengli lateral view explode(split(userl_ids,'\\[\\[\\[')) snTable as user_id
-
> ;
-
Total MapReduce jobs = 1
-
Launching Job 1 out of 1
-
Number of reduce tasks is set to 0 since there's no reduce operator
-
Starting Job = job_201403301416_445839, Tracking URL = http://10.1.9.10:50030/jobdetails.jsp?jobid=job_201403301416_445839
-
Kill Command = /app/home/hadoop/src/hadoop-0.20.2-cdh3u5/bin/../bin/hadoop job -Dmapred.job.tracker=10.1.9.10:9001 -kill job_201403301416_445839
-
2014-05-16 17:39:19,108 Stage-1 map = 0%, reduce = 0%
-
2014-05-16 17:39:28,157 Stage-1 map = 100%, reduce = 0%
-
2014-05-16 17:39:38,830 Stage-1 map = 100%, reduce = 100%
-
Ended Job = job_201403301416_445839
-
OK
-
game101 hu330602003
-
game101 hu330602004
-
game101 hu330602005
-
game101 15967506560
-
game101 15358083654
-
game101 ab33787873
-
game101 zjy18052480603
-
game101 shlg1881826
-
game101 lxqab110
-
game66 winning1ren
-
game66 13810537508
2.2 多個Lateral View
Array<int> col1 |
Array<string> col2 |
[1, 2] |
[a", "b", "c"] |
[3, 4] |
[d", "e", "f"] |
int myCol1 |
string myCol2 |
1 |
"a" |
1 |
"b" |
1 |
"c" |
2 |
"a" |
2 |
"b" |
2 |
"c" |
3 |
"d" |
3 |
"e" |
3 |
"f" |
4 |
"d" |
4 |
"e" |
4 |
"f" |
-
SELECT myCol1, myCol2 FROM baseTable
-
LATERAL VIEW explode(col1) myTable1 AS myCol1
-
LATERAL VIEW explode(col2) myTable2 AS myCol2;
3. Outer Lateral View
hive> select * FROM test_lateral_view_shengli LATERAL VIEW explode(array()) C AS a ;
結果是什么都不輸出。
SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;
-
238 val_238 NULL
-
86 val_86 NULL
-
311 val_311 NULL
-
27 val_27 NULL
-
165 val_165 NULL
-
409 val_409 NULL
-
255 val_255 NULL
-
278 val_278 NULL
-
98 val_98 NULL
-
...
4.總結: