Hive函數:GROUPING SETS,GROUPING__ID,CUBE,ROLLUP


參考:lxw大數據田地:http://lxw1234.com/archives/2015/04/193.htm

數據准備:

CREATE EXTERNAL TABLE test_data (
month STRING,
day STRING, 
cookieid STRING 
) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
stored as textfile location '/user/jc_rc_ftp/test_data';

select * from test_data l;
+----------+-------------+-------------+--+
| l.month  |    l.day    | l.cookieid  |
+----------+-------------+-------------+--+
| 2015-03  | 2015-03-10  | cookie1     |
| 2015-03  | 2015-03-10  | cookie5     |
| 2015-03  | 2015-03-12  | cookie7     |
| 2015-04  | 2015-04-12  | cookie3     |
| 2015-04  | 2015-04-13  | cookie2     |
| 2015-04  | 2015-04-13  | cookie4     |
| 2015-04  | 2015-04-16  | cookie4     |
| 2015-03  | 2015-03-10  | cookie2     |
| 2015-03  | 2015-03-10  | cookie3     |
| 2015-04  | 2015-04-12  | cookie5     |
| 2015-04  | 2015-04-13  | cookie6     |
| 2015-04  | 2015-04-15  | cookie3     |
| 2015-04  | 2015-04-15  | cookie2     |
| 2015-04  | 2015-04-16  | cookie1     |
+----------+-------------+-------------+--+
14 rows selected (0.249 seconds)

GROUPING SETS

在一個GROUP BY查詢中,根據不同的維度組合進行聚合,等價於將不同維度的GROUP BY結果集進行UNION ALL

SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM test_data 
GROUP BY month,day 
GROUPING SETS (month,day) 
ORDER BY GROUPING__ID;

等價於 
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day

+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| 2015-04  | NULL        | 6   | 1             |
| 2015-03  | NULL        | 5   | 1             |
| NULL     | 2015-04-16  | 2   | 2             |
| NULL     | 2015-04-15  | 2   | 2             |
| NULL     | 2015-04-13  | 3   | 2             |
| NULL     | 2015-04-12  | 2   | 2             |
| NULL     | 2015-03-12  | 1   | 2             |
| NULL     | 2015-03-10  | 4   | 2             |
+----------+-------------+-----+---------------+--+
8 rows selected (177.299 seconds)

SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM test_data 
GROUP BY month,day 
GROUPING SETS (month,day,(month,day)) 
ORDER BY GROUPING__ID;

等價於
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
UNION ALL 
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| 2015-04  | NULL        | 6   | 1             |
| 2015-03  | NULL        | 5   | 1             |
| NULL     | 2015-03-10  | 4   | 2             |
| NULL     | 2015-04-16  | 2   | 2             |
| NULL     | 2015-04-15  | 2   | 2             |
| NULL     | 2015-04-13  | 3   | 2             |
| NULL     | 2015-04-12  | 2   | 2             |
| NULL     | 2015-03-12  | 1   | 2             |
| 2015-04  | 2015-04-16  | 2   | 3             |
| 2015-04  | 2015-04-12  | 2   | 3             |
| 2015-04  | 2015-04-13  | 3   | 3             |
| 2015-03  | 2015-03-12  | 1   | 3             |
| 2015-03  | 2015-03-10  | 4   | 3             |
| 2015-04  | 2015-04-15  | 2   | 3             |
+----------+-------------+-----+---------------+--+

備注:其中的 GROUPING__ID,表示結果屬於哪一個分組集合。

CUBE

根據GROUP BY的維度的所有組合進行聚合。

SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM test_data 
GROUP BY month,day 
WITH CUBE 
ORDER BY GROUPING__ID;

等價於
SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM test_data
UNION ALL 
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
UNION ALL 
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| NULL     | NULL        | 7   | 0             |
| 2015-03  | NULL        | 5   | 1             |
| 2015-04  | NULL        | 6   | 1             |
| NULL     | 2015-04-16  | 2   | 2             |
| NULL     | 2015-04-15  | 2   | 2             |
| NULL     | 2015-04-13  | 3   | 2             |
| NULL     | 2015-04-12  | 2   | 2             |
| NULL     | 2015-03-12  | 1   | 2             |
| NULL     | 2015-03-10  | 4   | 2             |
| 2015-04  | 2015-04-12  | 2   | 3             |
| 2015-04  | 2015-04-16  | 2   | 3             |
| 2015-03  | 2015-03-12  | 1   | 3             |
| 2015-03  | 2015-03-10  | 4   | 3             |
| 2015-04  | 2015-04-15  | 2   | 3             |
| 2015-04  | 2015-04-13  | 3   | 3             |
+----------+-------------+-----+---------------+--+

ROLLUP

是CUBE的子集,以最左側的維度為主,從該維度進行層級聚合。

比如,以month維度進行層級聚合:
SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID  
FROM test_data 
GROUP BY month,day
WITH ROLLUP 
ORDER BY GROUPING__ID;
可以實現這樣的上鑽過程:月天的UV->月的UV->總UV
+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| NULL     | NULL        | 7   | 0             |
| 2015-04  | NULL        | 6   | 1             |
| 2015-03  | NULL        | 5   | 1             |
| 2015-04  | 2015-04-16  | 2   | 3             |
| 2015-04  | 2015-04-15  | 2   | 3             |
| 2015-04  | 2015-04-13  | 3   | 3             |
| 2015-04  | 2015-04-12  | 2   | 3             |
| 2015-03  | 2015-03-12  | 1   | 3             |
| 2015-03  | 2015-03-10  | 4   | 3             |
+----------+-------------+-----+---------------+--+
 
--把month和day調換順序,則以day維度進行層級聚合: 
SELECT 
day,
month,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID  
FROM test_data 
GROUP BY day,month 
WITH ROLLUP 
ORDER BY GROUPING__ID;
+-------------+----------+-----+---------------+--+
|     day     |  month   | uv  | grouping__id  |
+-------------+----------+-----+---------------+--+
| NULL        | NULL     | 7   | 0             |
| 2015-04-12  | NULL     | 2   | 1             |
| 2015-04-15  | NULL     | 2   | 1             |
| 2015-03-12  | NULL     | 1   | 1             |
| 2015-04-16  | NULL     | 2   | 1             |
| 2015-03-10  | NULL     | 4   | 1             |
| 2015-04-13  | NULL     | 3   | 1             |
| 2015-04-16  | 2015-04  | 2   | 3             |
| 2015-04-15  | 2015-04  | 2   | 3             |
| 2015-04-13  | 2015-04  | 3   | 3             |
| 2015-03-12  | 2015-03  | 1   | 3             |
| 2015-03-10  | 2015-03  | 4   | 3             |
| 2015-04-12  | 2015-04  | 2   | 3             |
+-------------+----------+-----+---------------+--+

可以實現這樣的上鑽過程:
天月的UV->天的UV->總UV
(這里,根據天和月進行聚合,和根據天聚合結果一樣,因為有父子關系,如果是其他維度組合的話,就會不一樣)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM