hive中array嵌套map以及行轉列的使用


1. 數據源信息

{"student": {"name":"king","age":11,"sex":"M"},"sub_score":[{"subject":"語文","score":80},{"subject":"數學","score":80},{"subject":"英語","score":80}]}
{"student": {"name":"king1","age":11,"sex":"M"},"sub_score":[{"subject":"語文","score":81},{"subject":"數學","score":80},{"subject":"英語","score":80}]}
{"student": {"name":"king2","age":12,"sex":"M"},"sub_score":[{"subject":"語文","score":82},{"subject":"數學","score":80},{"subject":"英語","score":80}]}
{"student": {"name":"king3","age":13,"sex":"M"},"sub_score":[{"subject":"語文","score":83},{"subject":"數學","score":80},{"subject":"英語","score":80}]}
{"student": {"name":"king4","age":14,"sex":"M"},"sub_score":[{"subject":"語文","score":84},{"subject":"數學","score":80},{"subject":"英語","score":80}]}
{"student": {"name":"king5","age":15,"sex":"M"},"sub_score":[{"subject":"語文","score":85},{"subject":"數學","score":80},{"subject":"英語","score":80}]}
{"student": {"name":"king5","age":16,"sex":"M"},"sub_score":[{"subject":"語文","score":86},{"subject":"數學","score":80},{"subject":"英語","score":80}]}
{"student": {"name":"king5","age":17,"sex":"M"},"sub_score":[{"subject":"語文","score":87},{"subject":"數學","score":80},{"subject":"英語","score":80}]}

2. 創建hive表

分析數據源,由於是json格式,

student字段使用map結構,sub_score字段使用array嵌套map的格式,

這樣使用的好處是如果數據源中只要第一層字段不會改變,都不會有任何影響,兼容性較強。

創建表語句如下, 注意使用下面這個json包,這樣解析json出錯時不至於程序掛掉。

下載地址: 

https://github.com/rcongiu/Hive-JSON-Serde

http://www.congiu.net/hive-json-serde/

create external table if not exists dw_stg.stu_score(
student map<string,string> comment "學生信息",
sub_score array<map<string,string>> comment '成績表'
) 
comment "學生成績表"
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile;

對於解析異常時報錯的處理,可以加上一下屬性:

ALTER TABLE dw_stg.stu_score SET SERDEPROPERTIES ( "ignore.malformed.json" = "true");

 

3. 上傳數據

將score.txt數據上傳到hive表stu_score目錄:

hdfs dfs -put score.txt hdfs://dwtest-name1:9000/user/hive/warehouse/dw_stg.db/stu_score/

4. 數據查詢

1)普通查詢

2)查詢單個學生的成績

3)行轉列explode ★★★

select explode(sub_score) from stu_score where student['name'] = 'king1';

 

4)更高級的寫法:行轉列lateral view .... explode ★★★

 當使用explode時,不支持使用其他字段,如下會報錯

所以使用另外一種用法

select student['name'],score['subject'],score['score'] 
from stu_score 
lateral view explode(sub_score) sc as score 
where student['name'] = 'king1';

5)保留null字段值 。格式 lateral view outer explode(field) 

如果數據源中學生分數為空時,在查詢時可能就不會顯示出來。比如下面的數據中,小明沒有成績。

使用4)中的查詢顯示如下:

此時,如果希望將小明也顯示出來,則可以使用 lateral view outer explode(field) 格式。

select student['name'],score 
from stu_score 
lateral view outer explode(sub_score) sc as score 

 

或者下面

 通過3)、4)、5)步驟基本可以實現所有字段的任意查詢和使用了。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM