Hive學習之路 (十七)Hive分析窗口函數(五) GROUPING SETS、GROUPING__ID、CUBE和ROLLUP


概述

GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

這幾個分析函數通常用於OLAP中,不能累加,而且需要根據不同維度上鑽和下鑽的指標統計,比如,分小時、天、月的UV數。

數據准備

數據格式

2015-03,2015-03-10,cookie1
2015-03,2015-03-10,cookie5
2015-03,2015-03-12,cookie7
2015-04,2015-04-12,cookie3
2015-04,2015-04-13,cookie2
2015-04,2015-04-13,cookie4
2015-04,2015-04-16,cookie4
2015-03,2015-03-10,cookie2
2015-03,2015-03-10,cookie3
2015-04,2015-04-12,cookie5
2015-04,2015-04-13,cookie6
2015-04,2015-04-15,cookie3
2015-04,2015-04-15,cookie2
2015-04,2015-04-16,cookie1

創建表

use cookie;
drop table if exists cookie5;
create table cookie5(month string, day string, cookieid string) 
row format delimited fields terminated by ',';
load data local inpath "/home/hadoop/cookie5.txt" into table cookie5;
select * from cookie5;

 

玩一玩GROUPING SETS和GROUPING__ID

說明

在一個GROUP BY查詢中,根據不同的維度組合進行聚合,等價於將不同維度的GROUP BY結果集進行UNION ALL

GROUPING__ID,表示結果屬於哪一個分組集合。

查詢語句

select 
  month,
  day,
  count(distinct cookieid) as uv,
  GROUPING__ID
from cookie.cookie5 
group by month,day 
grouping sets (month,day) 
order by GROUPING__ID;

等價於

SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day

 

查詢結果

 

結果說明

第一列是按照month進行分組

第二列是按照day進行分組

第三列是按照month或day分組是,統計這一組有幾個不同的cookieid

第四列grouping_id表示這一組結果屬於哪個分組集合,根據grouping sets中的分組條件month,day,1是代表month,2是代表day

再比如

SELECT  month, day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM cookie5 
GROUP BY month,day 
GROUPING SETS (month,day,(month,day)) 
ORDER BY GROUPING__ID;

等價於

SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day
UNION ALL 
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM cookie5 GROUP BY month,day

玩一玩CUBE

說明

根據GROUP BY的維度的所有組合進行聚合

查詢語句

SELECT  month, day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM cookie5 
GROUP BY month,day 
WITH CUBE 
ORDER BY GROUPING__ID;

等價於

SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM cookie5
UNION ALL 
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day
UNION ALL 
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM cookie5 GROUP BY month,day

 

查詢結果

玩一玩ROLLUP

說明

是CUBE的子集,以最左側的維度為主,從該維度進行層級聚合

查詢語句

-- 比如,以month維度進行層級聚合

SELECT  month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID  
FROM cookie5 
GROUP BY month,day WITH ROLLUP  ORDER BY GROUPING__ID;

可以實現這樣的上鑽過程:
月天的UV->月的UV->總UV

--把month和day調換順序,則以day維度進行層級聚合:

可以實現這樣的上鑽過程:
天月的UV->天的UV->總UV
(這里,根據天和月進行聚合,和根據天聚合結果一樣,因為有父子關系,如果是其他維度組合的話,就會不一樣)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM