hive中實現group_concat


mysql中的group_concat分組連接功能相當強大,可以先分組再連接成字符串,還可以進行排序連接。但是hive中並沒有這個函數,那么hive中怎么實現這個功能呢?

這里要用到:concat_ws函數和collect_list、collect_set 函數。

1. 建立測試表(無分區表):

create table if not exists db_name.test_tb(id string,content string,comment string) row format delimited fields terminated by '\1' stored as textfile

2. 插入幾條數據:

insert into db_name.test_tb values('1','Tom','測試1')
insert into db_name.test_tb values('1','Bob','測試2')
insert into db_name.test_tb values('1','Wendy','測試3')
insert into db_name.test_tb values('2','Bob','測試22')
insert into db_name.test_tb values('2','Tom','測試11')

3. concat_ws + collect_set + group by:

select
    id,
    concat_ws(',',collect_set(content)) as con_con,
    concat_ws(',',collect_set(comment)) as con_com
from db_name.test_tb
group by id

結果:無序且不對應(con_con與con_com的位置) —— 但是注意 collect_set會將重復的數據刪除,因為集合的性質。

每次運行連接的結果順序都可能不一樣。

4. concat_ws + collect_list + group by:

select
    id,
    concat_ws(',',collect_list(content)) as con_con,
    concat_ws(',',collect_list(comment)) as con_com
from db_name.test_tb
group by id

結果:對應(con_con與con_com的位置)但無序。

5. concat_ws + collect_list + group by + row_number():

select
    id,
    concat_ws(',',collect_list(content)) as con_con,
    concat_ws(',',collect_list(comment)) as con_com,
    concat_ws(',',collect_list(cast(rn as string))) as con_rn
from db_name.test_tb
(
select
    id,
    content,
    comment,
    row_number() over(partition by id order by content asc) as rn
from db_name.test_tb
)
group by id

結果:對應(con_con與con_com的位置)且有序。

 

#

https://www.cnblogs.com/zhangqian27/p/12836126.html

https://blog.csdn.net/changzoe/article/details/81181820


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM