Hive學習小記-(16)hive加載解析json文件


json文件hive解析落表

不同於Hive學習小記-(5)表字段變動頻繁時用json格式 那種簡單存成string再解析,參考:

https://www.cnblogs.com/30go/p/8328869.html

https://blog.csdn.net/lsr40/article/details/103020021

(1)json數據准備,xftp到Linux

# test_json_load
{"student": {"name":"king","age":11,"sex":"M"},"sub_score":[{"subject":"語文","score":80},{"subject":"數學","score":80},{"subject":"英語","score":80}]} {"student": {"name":"king1","age":11,"sex":"M"},"sub_score":[{"subject":"語文","score":81},{"subject":"數學","score":80},{"subject":"英語","score":80}]} {"student": {"name":"king2","age":12,"sex":"M"},"sub_score":[{"subject":"語文","score":82},{"subject":"數學","score":80},{"subject":"英語","score":80}]} {"student": {"name":"king3","age":13,"sex":"M"},"sub_score":[{"subject":"語文","score":83},{"subject":"數學","score":80},{"subject":"英語","score":80}]} {"student": {"name":"king4","age":14,"sex":"M"},"sub_score":[{"subject":"語文","score":84},{"subject":"數學","score":80},{"subject":"英語","score":80}]} {"student": {"name":"king5","age":15,"sex":"M"},"sub_score":[{"subject":"語文","score":85},{"subject":"數學","score":80},{"subject":"英語","score":80}]} {"student": {"name":"king5","age":16,"sex":"M"},"sub_score":[{"subject":"語文","score":86},{"subject":"數學","score":80},{"subject":"英語","score":80}]} {"student": {"name":"king5","age":17,"sex":"M"},"sub_score":[{"subject":"語文","score":87},{"subject":"數學","score":80},{"subject":"英語","score":80}]}

(2)建表

分析json格式數據源,student字段使用map結構,sub_score字段使用array嵌套map的格式,

這樣使用的好處是如果數據源中只要第一層字段不會改變,都不會有任何影響,兼容性較強。

創建表語句如下, 注意row format serde中org.apache.hive.hcatalog.data.JsonSerDe這個json包,這樣解析json出錯時不至於程序掛掉。

tips:對於解析異常時報錯的處理,可以加上一下屬性:ALTER TABLE dw_stg.stu_score SET SERDEPROPERTIES ( "ignore.malformed.json" = "true");這里暫不涉及。

sc.sql(""" create table if not exists test_youhua.test_json_load(
student map<string,string> comment "學生信息",
sub_score array<map<string,string>> comment '成績表'
) 
comment "json_學生成績表"
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' """)
# 這樣直接使用JsonSerDe類,是會報錯的,因為這個類並沒有在初始化的時候加載到環境中,報錯如下
AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not found);'

(3)JsonSerDe類加載

這里執行ADD JAR ${HIVE_HOME}/hcatalog/share/hcatalog/hive-hcatalog-core....jar. 不同版本的jar包路徑可能有些差別

[root@hadoop02 hive]# add jar ../hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar 
-bash: add: 未找到命令
-- 注意這個add jar是在hive里執行的,而不是bash命令
[root@hadoop02 hive]# bin/hive
ls: 無法訪問/opt/module/spark/lib/spark-assembly-*.jar: 沒有那個文件或目錄
-- JsonSerDe這個類並沒有在初始化的時候加載到環境中
Logging initialized using configuration in jar:file:/opt/module/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties 
hive> add jar /opt/module/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar; 
Added [/opt/module/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar] to class path 
Added resources: [/opt/module/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar]

(4)再建表成功

(5)將數據上傳HDFS表目錄,查詢成功

#將文件上傳HDFS表目錄
[root@hadoop02 hive]# hdfs dfs -put /opt/module/hive/my_input/test_json_load  hdfs:///user/hive/warehouse/test_youhua.db/test_json_load;
# 登錄hive
[root@hadoop02 hive]# bin/hive
ls: 無法訪問/opt/module/spark/lib/spark-assembly-*.jar: 沒有那個文件或目錄
Logging initialized using configuration in jar:file:/opt/module/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties

# 這里發現每次都要重復導入這個JsonSerDe類,否則會報錯,常用的話還是要加一下默認路徑避免每次重復操作
hive> select * from test_youhua.test_json_load;
FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not found)

hive> add jar /opt/module/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar
    > ;
Added [/opt/module/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar] to class path
Added resources: [/opt/module/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar]
# 查詢成功
hive> select * from test_youhua.test_json_load;
OK
{"name":"king","age":"11","sex":"M"}    [{"subject":"語文","score":"80"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
{"name":"king1","age":"11","sex":"M"}    [{"subject":"語文","score":"81"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
{"name":"king2","age":"12","sex":"M"}    [{"subject":"語文","score":"82"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
{"name":"king3","age":"13","sex":"M"}    [{"subject":"語文","score":"83"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
{"name":"king4","age":"14","sex":"M"}    [{"subject":"語文","score":"84"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
{"name":"king5","age":"15","sex":"M"}    [{"subject":"語文","score":"85"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
{"name":"king5","age":"16","sex":"M"}    [{"subject":"語文","score":"86"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
{"name":"king5","age":"17","sex":"M"}    [{"subject":"語文","score":"87"},{"subject":"數學","score":"80"},{"subject":"英語","score":"80"}]
Time taken: 0.518 seconds, Fetched: 8 row(s)
hive> 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM