Impala & Hive 使用復雜數據類型


1. 環境

CDH 5.16.1

2. Hive 使用復雜數據類型

2.1 數據格式

1       zhangsan:man    football,basketball
2       lisi:female     sing,dance

2.2 Hive 建表

create table studentInfo(
    id int,
    info map<string,string>  comment 'map<姓名,性別>',
    favorite array<string> comment 'array[football,basketball]'
)
row format delimited fields terminated by '\t'    --列分隔符
collection items terminated by ','   --array中各個item之間的分隔符
map keys terminated 
by ':'        --map中key和value之間的分隔符
lines terminated by '\n';       --行分隔符

2.3 導入數據

load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;

2.3 執行查詢

select *  from studentInfo;

+-----------------+---------------------+----------------------------+--+
| studentinfo.id  |  studentinfo.info   |    studentinfo.favorite    |
+-----------------+---------------------+----------------------------+--+
| 1               | {"zhangsan":"man"}  | ["football","basketball"]  |
| 2               | {"lisi":"female"}   | ["sing","dance"]           |
+-----------------+---------------------+----------------------------+--+




-- 對於map查詢,map[key]
--對於array查詢,array[index]
select id, info['zhangsan'],favorite[1] from studentInfo;

+-----+-------+-------------+--+
| id  |  sex  |  favorite   |
+-----+-------+-------------+--+
| 1   | man   | basketball  |
| 2   | NULL  | dance       |
+-----+-------+-------------+--+

3. Impala 使用復雜類型

注意:Impala 只用parquet格式存儲時,才能使用復雜數據類型

3.1 Hive中建表(parquet格式,導入數據

create table student_parquet(
    id int,
    info map<string,string>  comment 'map<姓名,性別>',
    favorite array<string> comment 'array[football,basketball]'
)
stored as parquet

insert overwrite table student_parquet select id,info,favorite from studentInfo;

3.2 刷新impala元數據

refresh default.student_parquet;

3.3 執行查詢

select 
    id ,favorite_array.item,info_map.key,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array;

+----+------------+----------+--------+
| id | item       | key      | value  |
+----+------------+----------+--------+
| 1  | football   | zhangsan | man    |
| 1  | basketball | zhangsan | man    |
| 2  | sing       | lisi     | female |
| 2  | dance      | lisi     | female |
+----+------------+----------+--------+




select 
    id ,favorite_array.item
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.POS = 0;

+----+----------+
| id | item     |
+----+----------+
| 1  | football |
| 2  | sing     |
+----+----------+




select 
    id ,favorite_array.item,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.item = 'sing'
and info_map.key = 'lisi';

+----+------+--------+
| id | item | value  |
+----+------+--------+
| 2  | sing | female |
+----+------+--------+

總結:

  1. array 類型視為 一張表, 其列名為 item

  2. map類型有兩個列, 一個是key, 一個是value

參考:

  1. https://blog.csdn.net/rav009/article/details/86750850
  2. https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/impala_complex_types.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM