原文:https://yq.aliyun.com/articles/654743
官方文檔:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
日期處理函數 1)date_format函數(根據格式整理日期) hive (gmall)> select date_format('2020-02-10','yyyy-MM'); 2020-02 2)date_add函數(加減日期) hive (gmall)> select date_add('2020-02-10',-1); 2020-02-09 hive (gmall)> select date_add('2020-02-10',1); 2020-02-11 3)next_day函數 (1)取當前天的下一個周一 hive (gmall)> select next_day('2020-02-12','MO') 2020-02-18 說明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday) (2)取當前周的周一 hive (gmall)> select date_add(next_day('2020-02-12','MO'),-7); 2020-02-11 4)last_day函數(求當月最后一天日期) hive (gmall)> select last_day('2020-02-10'); 2020-02-28
1.1 concat:concat函數在連接字符串的時候,只要其中一個是NULL,那么將返回NULL
hive> select concat('a','b'); OK ab Time taken: 0.477 seconds, Fetched: 1 row(s) hive> select concat('a','b',null); OK NULL Time taken: 0.181 seconds, Fetched: 1 row(s) 原文鏈接:https://blog.csdn.net/henrrywan/java/article/details/86543202
concat_ws函數在連接字符串的時候,只要有一個字符串不是NULL,就不會返回NULL。concat_ws函數需要指定分隔符。
hive> select concat_ws('-','a','b'); OK a-b Time taken: 0.245 seconds, Fetched: 1 row(s) hive> select concat_ws('-','a','b',null); OK a-b Time taken: 0.177 seconds, Fetched: 1 row(s) hive> select concat_ws('','a','b',null); OK ab Time taken: 0.184 seconds, Fetched: 1 row(s)
collect_set函數
1)創建原數據表 hive (gmall)> drop table if exists stud; create table stud (name string, area string, course string, score int);
2)向原數據表中插入數據 hive (gmall)> insert into table stud values('zhang3','bj','math',88); insert into table stud values('li4','bj','math',99); insert into table stud values('wang5','sh','chinese',92); insert into table stud values('zhao6','sh','chinese',54); insert into table stud values('tian7','bj','chinese',91); 3)查詢表中數據 hive (gmall)> select * from stud; stud.name stud.area stud.course stud.score zhang3 bj math 88 li4 bj math 99 wang5 sh chinese 92 zhao6 sh chinese 54 tian7 bj chinese 91 4)把同一分組的不同行的數據聚合成一個集合 hive (gmall)> select course, collect_set(area), avg(score) from stud group by course; chinese ["sh","bj"] 79.0 math ["bj"] 93.5 5) 用下標可以取某一個 hive (gmall)> select course, collect_set(area)[0], avg(score) from stud group by course; chinese sh 79.0 math bj 93.5
2、explode
explode(ARRAY) 列表中的每個元素生成一行
explode(MAP) map中每個key-value對,生成一行,key為一列,value為一列
限制:
1、No other expressions are allowed in SELECT
SELECT pageid, explode(adid_list) AS myCol... is not supported
2、UDTF's can't be nested
SELECT explode(explode(adid_list)) AS myCol... is not supported
3、GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
2、lateral view
可使用lateral view解除以上限制,語法:
lateralView: LATERAL VIEW explode(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*
案例:
table名稱為pageAds
SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
輸出結果:
3、多個lateral view
from語句后面可以帶多個lateral view語句
案例:
表名:baseTable
from后只有一個lateral view:
SELECT myCol1, col2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1;
結果:
多個lateral view:
SELECT myCol1, myCol2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;
結果:
4、Outer Lateral Views
如果array類型的字段為空,但依然需返回記錄,可使用outer關鍵詞。
比如:select * from src LATERAL VIEW explode(array()) C AS a limit 10;
這條語句中的array字段是個空列表,這條語句不管src表中是否有記錄,結果都是空的。
而:select * from src LATERAL VIEW OUTER explode(array()) C AS a limit 10;
結果中的記錄數為src表的記錄數,只是a字段為NULL。
比如:
238 val_238 NULL
86 val_86 NULL
311 val_311 NULL
27 val_27 NULL
165 val_165 NULL
409 val_409 NULL
255 val_255 NULL
278 val_278 NULL
98 val_98 NULL