今天幫同事測試,發現代碼里有個好用的hive 函數:
1. collect_set 可以輸出未包含在groupby里的字段。條件是,這個字段值對應於主鍵是唯一的。
select a, collect_set(b)[0], count(*) -- 同時想輸出每個主鍵對應的b字段 from ( select 'a' a, 'b' b from test.dual )a group by a; -- 根據a group by
2. concat_ws 和collect_set 一起可以把group by的結果集,合並成一條記錄。
對表
col_1 | col_0 |
hell0 | 1 |
hello | 2 |
合並成一條
hello | 1,2 |
select col_1, concat_ws(',',collect_set(cast(col_0 as string))) as col_0s from ( select 1 col_0,col_1 from test.dual union all select 2 col_0,col_1 from test.dual )a group by col_1;