Hive學習小記-（17）inline(array(struct))與explode

本文轉載自查看原文 2021-01-25 20:18 404 Hive

inline

前情提要：inline無法作用於map,array(map)

關於inline：在橫表縱表轉換一節已經試過，map無法使用inline；

在這里將map轉成array，發現還是無法用inline，看來inline只適用array(struct)格式；

# map轉array，還是不能用lateral view inline；inline只適用於array(struct)格式
sc.sql(''' select id
,array(str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))))
from test_youhua.zongbiao 
group by id ''')
# 查詢結果已經轉成了ARRAY
1    [{"baoxian":"1.2","cunkuan":"1.3","jijin":"1.1"}]
2    [{"baoxian":"2.34","cunkuan":"2.1","jijin":"2.67"}]

# 還是不能用inline
sc.sql(''' select map_tmp_tbl.id,c1 from  (
    select id
            ,array(str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string)))))) as array_map_col
        from test_youhua.zongbiao 
        group by id
) as map_tmp_tbl  lateral view inline(map_tmp_tbl.array_map_col) t1 as c1  ''').show()

# 報錯，inline使用的格式為array(struct)，這里格式array(map)不匹配
AnalysisException: "cannot resolve 'inline(map_tmp_tbl.`array_map_col`)' due to data type mismatch: input to function inline should be array of struct type, not ArrayType(MapType(StringType,StringType,true),false);

看了這篇怎么感覺可以應用於array(map)???

https://blog.csdn.net/JnYoung/article/details/106169019

不一樣的，這個示例named_struct_1字段事先就存成了struct類型。

那接下來老老實實建一個array(struct)格式字段來處理吧

（1）數據准備-建表insert-select：直接將map轉array后不能用inline的數據存成array(struct)：不能，會報錯字段類型不匹配。

這里有點奇怪，hive是schema on read，insert的時候會檢查字段格式是否一致嗎？？

比如parquet不支持date格式，insert進去也只是顯示空字段，而不是一開始就insert報錯

# 建表
create table if not exists test_youhua.test_array_struct_inline(
custom_id int comment "客戶id",
all_bal array<struct<baoxian:float, cunkuan:float, jijin:float>> comment '資產配置'
) 
comment "array_struct_客戶資產配置表"
;
# 插入數據
insert overwrite test_youhua.test_array_struct_inline
select id
,array(str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))))
from test_youhua.zongbiao 
group by id

# 報錯：字段類型不一致
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table 
because column number/types are different 'test_array_struct_inline': 
Cannot convert column 1 from array<map<string,string>> to array<struct<baoxian:float,cunkuan:float,jijin:float>>.

（2）數據准備-直接load文件到test_youhua.test_array_struct_inline

# 文件准備 test_array_struct_inline, xftp到Linux
1    [{"baoxian":"1.2","cunkuan":"1.3","jijin":"1.1"}]
2    [{"baoxian":"2.34","cunkuan":"2.1","jijin":"2.67"}]

# load 到HDFS
hdfs dfs -put /opt/module/hive/my_input/test_array_struct_inline  hdfs:///user/hive/warehouse/test_youhua.db/test_array_struct_inline

# 查詢hive數據，數據確實已經load上去了，但是讀不出來
sc.sql(""" select * from test_youhua.test_array_struct_inline""").show()
+---------+-------+
|custom_id|all_bal|
+---------+-------+
|     null|   null|
|     null|   null|
+---------+-------+

# 猜測是分隔符的原因，重新指定一下分隔符
sc.sql(""" drop table test_youhua.test_array_struct_inline""")
sc.sql("""create table if not exists test_youhua.test_array_struct_inline(
custom_id int comment "客戶id",
all_bal array<struct<baoxian:float, cunkuan:float, jijin:float>> comment '資產配置'
) 
comment "array_struct_客戶資產配置表"
row format delimited fields terminated by ','
collection items terminated by '_'
""")

!hdfs dfs -put /opt/module/hive/my_input/test_array_struct_inline  hdfs:///user/hive/warehouse/test_youhua.db/test_array_struct_inline

sc.sql(""" select * from test_youhua.test_array_struct_inline""").show()

#無論怎樣改都不行，讀不出來，可能是array嵌套struct，影響了分隔符指定的緣故(其實是因為json格式需要導入serde包)

（3）數據准備-用json包指定row format讀文件

其實之前數據無法正常read是因為json的分隔符的原因，需要導入jsonserde包

參考：Hive學習小記-（16）hive加載解析json文件稍微修改了一下文件：

{"custom_id":"1","all_bal":[{"baoxian":"1.2","cunkuan":"1.3","jijin":"1.1"}]}
{"custom_id":"2","all_bal":[{"baoxian":"2.34","cunkuan":"2.1","jijin":"2.67"}]}

hive> add jar /opt/module/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar
hive> create table if not exists test_youhua.test_array_struct_inline(
> custom_id string comment "客戶id",
> all_bal array<struct<baoxian:string, cunkuan:string, jijin:string>> comment '資產配置'
> ) 
> comment "array_struct_客戶資產配置表"
> row format serde 'org.apache.hive.hcatalog.data.JsonSerDe';
OK
Time taken: 0.09 seconds
hive> select * from test_youhua.test_array_struct_inline;
OK
Time taken: 0.089 seconds

# 數據導入並且讀取成功
hive> select * from test_youhua.test_array_struct_inline;
OK
1    [{"baoxian":"1.2","cunkuan":"1.3","jijin":"1.1"}]
2    [{"baoxian":"2.34","cunkuan":"2.1","jijin":"2.67"}]
Time taken: 0.107 seconds, Fetched: 2 row(s)

#注意這里字段類型全部改為string，否則select會報錯：
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Current token (VALUE_STRING) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream@ab327c; line: 1, column: 41]

（4）用inline可以打開array(struct)，對比explode只是打開array

參考：https://blog.csdn.net/weixin_42003671/article/details/88132666

# 原始數據
hive> select * from test_youhua.test_array_struct_inline;
OK
1    [{"baoxian":"1.2","cunkuan":"1.3","jijin":"1.1"}]
2    [{"baoxian":"2.34","cunkuan":"2.1","jijin":"2.67"}]
Time taken: 0.103 seconds, Fetched: 2 row(s)

#這個時候如果想讀取jijin、baoxian、cunkuan的余額，需要：
#這是在array的元素只有一條struct數據的情況，如果有多條struct元素，通過array[i]的形式來尋找某個key的value會比較困難，這時候就要借助inline和explode來將多個struct的某個key對應的value轉到一列
hive> select all_bal[0].jijin,all_bal[0].baoxian,all_bal[0].cunkuan from test_youhua.test_array_struct_inline;
OK
1.1    1.2    1.3
2.67    2.34    2.1
Time taken: 0.587 seconds, Fetched: 2 row(s)

#用 inline 將多個struct的某個key對應的value轉到一列
hive> select tmp.custom_id,c1,c2,c3 from test_youhua.test_array_struct_inline  as tmp lateral view inline(tmp.all_bal) t1 as c1,c2,c3;
OK
1    1.2    1.3    1.1
2    2.34    2.1    2.67
Time taken: 0.093 seconds, Fetched: 2 row(s)

#對比用explode來轉，explode只能打開一層,即去掉了array的[]
hive> select tmp.custom_id,c1 from test_youhua.test_array_struct_inline as tmp lateral view explode(tmp.all_bal) t1 as c1;
OK
1    {"baoxian":"1.2","cunkuan":"1.3","jijin":"1.1"}
2    {"baoxian":"2.34","cunkuan":"2.1","jijin":"2.67"}

#explode還需再加上struct.key來進一步取key對應的value值；這么看inline能夠比explode打開更深一層，inline可以直接取到value，explode還要再通過struct.key形式來取value
hive> select tmp.custom_id,c1.jijin from test_youhua.test_array_struct_inline as tmp lateral view explode(tmp.all_bal) t1 as c1; 
OK
1    1.1
2    2.67
Time taken: 0.122 seconds, Fetched: 2 row(s)

#參考鏈接里這種寫法不行，需要數據明確key、value
select tmp.custom_id,c1.value from test_youhua.test_array_struct_inline as tmp lateral view explode(tmp.all_bal) t1 as c1 where c1.key="jijin";
# 報錯：
RuntimeException cannot find field key(lowercase form: key) in [baoxian, cunkuan, jijin]

tips

（1）org.apache.hive.hcatalog.data.JsonSerDe 對復雜類型支持不足

參考：https://www.cnblogs.com/aprilrain/p/6916359.html

（2）insert數據到array(struct)-用named_struct

參考這個：https://blog.csdn.net/random0815/article/details/85252250

以及報錯解決：https://blog.csdn.net/qq_36203774/article/details/102599260

insert into test_youhua.test_array_struct_inline 
select "4",array(named_struct('baoxian','1.46','cunkuan','1.46','jijin','1.46'));
# 報錯
ParseException line 1:124 Failed to recognize predicate '<EOF>'. Failed rule: 'regularBody' in statement

#報錯解決，建臨時表tmp,加from改寫，這里要注意struct里面字段順序
with tmp as 
(select "3",array(named_struct('baoxian','1.45','cunkuan','1.45','jijin','1.45')))
insert into test_youhua.test_array_struct_inline
select * from tmp;

select * from test_youhua.test_array_struct_inline
#數據插入成功
3    [{"baoxian":"1.45","cunkuan":"1.45","jijin":"1.45"}]
1    [{"baoxian":"1.2","cunkuan":"1.3","jijin":"1.1"}]
2    [{"baoxian":"2.34","cunkuan":"2.1","jijin":"2.67"}]

# 用struct來select可以，但是無法insert到指定的array(struct)中
with tmp as 
(select "6",array(struct('1.45','1.45','1.45')))
insert into test_youhua.test_array_struct_inline
select * from tmp;
# struct報錯，列名無法對應
Cannot insert into target table because column number/types are different 'test_array_struct_inline': Cannot convert column 1 from array<struct<col1:string,col2:string,col3:string>> to array<struct<baoxian:string,cunkuan:string,jijin:string>>.

（3）map、array、struct讀數據的方式

select array[1],map['xiao song'],struct.city from test

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive之explode和lateral view hive的lateral view explode 功能 hive lateral view和explode的理解 hive lateral view 與 explode詳解 Spark學習小記-（4）jupyter連接pyspark操作hdfs及hive FAILED: NoMatchingMethodException No matching method for class org.apache.hadoop.hive.ql.udf.UDFToString with (struct hive splict, explode, lateral view, concat_ws 一道hive面試題:explode map字段 hive array類型的字段查詢 hive中named_struct構造和使用