hive函數之~hive當中的lateral view 與 explode


1、使用explode函數將hive表中的Map和Array字段數據進行拆分

  lateral view用於和split、explode等UDTF一起使用的,能將一行數據拆分成多行數據,在此基礎上可以對拆分的數據進行聚合,lateral view首先為原始表的每行調用UDTF,UDTF會把一行拆分成一行或者多行,lateral view在把結果組合,產生一個支持別名表的虛擬表。

  其中explode還可以用於將hive一列中復雜的array或者map結構拆分成多行。

需求:現在有數據格式如下

zhangsan     child1,child2,child3,child4      k1:v1,k2:v2
lisi  child5,child6,child7,child8      k3:v3,k4:v4

  字段之間使用\t分割,需求將所有的child進行拆開成為一列

+----------+--+
| mychild  |
+----------+--+
| child1   |
| child2   |
| child3   |
| child4   |
| child5   |
| child6   |
| child7   |
| child8   |
+----------+--+

將map的key和value也進行拆開,成為如下結果

+-----------+-------------+--+
| mymapkey  | mymapvalue  |
+-----------+-------------+--+
| k1        | v1          |
| k2        | v2          |
| k3        | v3          |
| k4        | v4          |
+-----------+-------------+--+

第一步:創建hive數據庫

創建hive數據庫

hive (default)> create database hive_explode;
hive (default)> use hive_explode;

第二步:創建hive表,然后使用explode拆分map和array

hive (hive_explode)> create  table t3(name string,children array<string>,address Map<string,string>)
                    row format delimited fields terminated by '\t'
                    collection items terminated by ','
                    map keys terminated by ':' stored as textFile;

第三步:加載數據

node03執行以下命令創建表數據文件

mkdir -p /export/servers/hivedatas/
cd /export/servers/hivedatas/
vim maparray

zhangsan     child1,child2,child3,child4      k1:v1,k2:v2
lisi  child5,child6,child7,child8      k3:v3,k4:v4

hive表當中加載數據

hive (hive_explode)> load data local inpath '/export/servers/hivedatas/maparray' into table t3;

第四步:使用explode將hive當中數據拆開

將array當中的數據拆分開

hive (hive_explode)> SELECT explode(children) AS myChild FROM t3;

將map當中的數據拆分開

hive (hive_explode)> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;

2、使用explode拆分json字符串

需求:現在有一些數據格式如下:

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

其中字段與字段之間的分隔符是 |

我們要解析得到所有的monthSales對應的值為以下這一列(行轉列)

4900
2090
6987

第一步:創建hive表

hive (hive_explode)> create table explode_lateral_view
                   (`area` string,
                   `goods_id` string,
                   `sale_info` string)
                   ROW FORMAT DELIMITED
                   FIELDS TERMINATED BY '|'
                   STORED AS textfile;

第二步:准備數據並加載數據

准備數據如下

cd /export/servers/hivedatas
vim explode_json
a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

加載數據到hive表當中去

hive (hive_explode)> load data local inpath '/export/servers/hivedatas/explode_json' overwrite into table explode_lateral_view;

第三步:使用explode拆分Array

hive (hive_explode)> select explode(split(goods_id,',')) as goods_id from explode_lateral_view;

第四步:使用explode拆解Map

hive (hive_explode)> select explode(split(area,',')) as area from explode_lateral_view;

5.創建hive表並導入數據

創建hive表並加載數據

hive (hive_explode)> create table person_info(
                    name string,
                    constellation string,
                    blood_type string)
                    row format delimited fields terminated by "\t";
加載數據
hive (hive_explode)> load data local inpath '/export/servers/hivedatas/constellation.txt' into table person_info;

第五步:拆解json字段

hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as  sale_info from explode_lateral_view;

然后我們想用get_json_object來獲取key為monthSales的數據:

hive (hive_explode)> select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')),'$.monthSales') as  sale_info from explode_lateral_view;

然后掛了FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

UDTF explode不能寫在別的函數內

如果你這么寫,想查兩個字段,select explode(split(area,',')) as area,good_id from explode_lateral_view;

會報錯FAILED: SemanticException 1:40 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'good_id'

使用UDTF的時候,只支持一個字段,這時候就需要LATERAL VIEW出場了

3、配合LATERAL  VIEW使用

配合lateral view查詢多個字段

hive (hive_explode)> select goods_id2,sale_info from explode_lateral_view LATERAL VIEW explode(split(goods_id,','))goods as goods_id2;

其中LATERAL VIEW explode(split(goods_id,','))goods相當於一個虛擬表,與原表explode_lateral_view笛卡爾積關聯。

也可以多重使用

hive (hive_explode)> select goods_id2,sale_info,area2
                    from explode_lateral_view
                    LATERAL VIEW explode(split(goods_id,','))goods as goods_id2
                    LATERAL VIEW explode(split(area,','))area as area2;
也是三個表笛卡爾積的結果

最終,我們可以通過下面的句子,把這個json格式的一行數據,完全轉換成二維表的方式展現

hive (hive_explode)> select get_json_object(concat('{',sale_info_1,'}'),'$.source') as source,
                    get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales,
                    get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as monthSales,
                    get_json_object(concat('{',sale_info_1,'}'),'$.score') as monthSales from explode_lateral_view
                    LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{'))sale_info as sale_info_1;

總結:

Lateral View通常和UDTF一起出現,為了解決UDTF不允許在select字段的問題。 
Multiple Lateral View可以實現類似笛卡爾乘積。 
Outer關鍵字可以把不輸出的UDTF的空結果,輸出成NULL,防止丟失數據。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM