hive行轉列


一、問題

hive如何將

a       1,2,3
b       4,7
c       5

轉化成為:

a       1
a       2
a       3
b       4
b       7
c       5

二、原始數據

cat row_column.txt
a       1,2,3
b       4,7
c       5

三、解決方案

3.1 遍歷每一列

3.1.1 創建表

-- 創建表
create table tmp.row_column
(
col1 string,
col3 string
)
row format delimited fields terminated by '\t'
stored as textfile;
-- 載入數據
load data local inpath '/tmp/row_column.txt' into table row_column;

3.1.2 查看數據:

hive> select * from row_column;                                          
OK
a       1,2,3
b       4,7
c       5

3.1.3 遍歷每一列

select col1,name 
from tmp.row_column
lateral view explode(split(col3,',')) col3 as name;
---------------------------------------------------------------
Total MapReduce CPU Time Spent: 2 seconds 20 msec
OK
a       1
a       2
a       3
b       4
b       7
c       5

3.2 數組遍歷

3.2.1 創建表

create table tmp.row_column_array
(
  col1 string,
  col3 array<int>
)
row format delimited 
fields terminated by '\t'
collection items terminated by ','
stored as textfile;

3.2.2 加載數據

load data local inpath '/tmp/row_column.txt' into table tmp.row_column_array;

3.2.3 查看數據

hive> select * from tmp.row_column_array;
OK
a       [1,2,3]
b       [4,7]
c       [5]

3.2.4 查看每一列

select col1,name
from tmp.row_column_array
lateral view explode(col3) col3 as name;

3.2.5 結果

a       1
a       2
a       3
b       4
b       7
c       5

四、補充

查看使用逗號分割的列

select t.list[0],t.list[1],t.list[2] from (
select (split(col3,',')) list from tmp.row_column)t;
Total MapReduce CPU Time Spent: 1 seconds 740 msec
OK
1       2       3
4       7       NULL
5       NULL    NULL
Time taken: 15.264 seconds, Fetched: 3 row(s)

查看長度

select col1, size(split(col3,',')) list from tmp.row_column;
Total MapReduce CPU Time Spent: 1 seconds 690 msec
OK
a       3
b       2
c       1

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM