mysql中的group_concat分組連接功能相當強大,可以先分組再連接成字符串,還可以進行排序連接。但是hive中並沒有這個函數,那么hive中怎么實現這個功能呢?
這里要用到:concat_ws函數和collect_list、collect_set 函數。
1. 建立測試表(無分區表):
create table if not exists db_name.test_tb(id string,content string,comment string) row format delimited fields terminated by '\1' stored as textfile
2. 插入幾條數據:
insert into db_name.test_tb values('1','Tom','測試1') insert into db_name.test_tb values('1','Bob','測試2') insert into db_name.test_tb values('1','Wendy','測試3') insert into db_name.test_tb values('2','Bob','測試22') insert into db_name.test_tb values('2','Tom','測試11')

3. concat_ws + collect_set + group by:
select id, concat_ws(',',collect_set(content)) as con_con, concat_ws(',',collect_set(comment)) as con_com from db_name.test_tb group by id

結果:無序且不對應(con_con與con_com的位置) —— 但是注意 collect_set會將重復的數據刪除,因為集合的性質。
每次運行連接的結果順序都可能不一樣。
4. concat_ws + collect_list + group by:
select id, concat_ws(',',collect_list(content)) as con_con, concat_ws(',',collect_list(comment)) as con_com from db_name.test_tb group by id

結果:對應(con_con與con_com的位置)但無序。
5. concat_ws + collect_list + group by + row_number():
select id, concat_ws(',',collect_list(content)) as con_con, concat_ws(',',collect_list(comment)) as con_com, concat_ws(',',collect_list(cast(rn as string))) as con_rn from db_name.test_tb ( select id, content, comment, row_number() over(partition by id order by content asc) as rn from db_name.test_tb ) group by id

結果:對應(con_con與con_com的位置)且有序。
#
