原始數據集如下:
group by語法
對數據集進行聚合操作,主要是通過group by子句實現,如下(按照name,對每個人的id進行匯總):
但是當希望在一段sql語法中,實現較復雜的聚合操作,則可以通過presto中的GROUPING SETS,CUBE和ROLLUP語法實現。
復雜的分組操作通常等同於所有簡單表達式的並集。然而,這種等價不適用當數據源的聚集是非確定性的。
grouping sets語法
select date,name,sum(cast(id as double))as num
from
(
select '張三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '趙四'as name,'2021-04-16' as date,'200'as id
union all
select '張三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by grouping sets((date),(date,name),(name))
以上是通過grouping sets分別按照date、date和name、name對id進行sum聚合處理
cube語法
cube運算符為給定的列生成所有可能的分組集(即排列組合),2^n個組合,如cube(A,B),按照A,AB,B,()進行匯總
select date,name,sum(cast(id as double))as num
from
(
select '張三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '趙四'as name,'2021-04-16' as date,'200'as id
union all
select '張三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by cube(date,name) --## 等同於group by grouping sets((date),(date,name),(name),())
rollup語法
ROLLUP運算符對於一個給定的列生成所有可能的子分類匯總,2*n-1種分類,如rollup(A,B)按照A,AB,()進行匯總
select date,name,sum(cast(id as double))as num
from
(
select '張三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '趙四'as name,'2021-04-16' as date,'200'as id
union all
select '張三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by rollup(date,name) -- ## 注意順序的區別group by rollup(name,date)
多個分組表達式結合使用
同一查詢中的多個分組表達式被解釋為具有跨產品語義,ALL 和 DISTINCT 關鍵字決定是否重復分組集每個產生不同的輸出行,默認是all,即重復所有分組集
select date,name,sum(cast(id as double))as num
from
(
select '張三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '趙四'as name,'2021-04-16' as date,'200'as id
union all
select '張三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by distinct rollup(date,name),cube(date,name)
-- ## 等同於 group by grouping sets((date),(date,name),(name),())
select date,name,sex,sum(cast(id as double))as num
from
(
select '張三'as name,'2021-04-11' as date,'100'as id,'男'as sex
union all
select '李四'as name,'2021-04-09' as date,'100'as id,'男'as sex
union all
select '趙四'as name,'2021-04-16' as date,'200'as id,'女'as sex
union all
select '張三'as name,'2021-03-10'as date,'300'as id,'男'as sex
union all
select '李四'as name,'2020-01-01'as date,'150'as id,'男'as sex
)a
group by all rollup(date,name),cube(date,sex)
等同於
select date,name,sex,sum(cast(id as double))as num
from
(
select '張三'as name,'2021-04-11' as date,'100'as id,'男'as sex
union all
select '李四'as name,'2021-04-09' as date,'100'as id,'男'as sex
union all
select '趙四'as name,'2021-04-16' as date,'200'as id,'女'as sex
union all
select '張三'as name,'2021-03-10'as date,'300'as id,'男'as sex
union all
select '李四'as name,'2020-01-01'as date,'150'as id,'男'as sex
)a
group by grouping sets((date),(date,sex),(date),(date,name),(date,name,sex),(name),(date,name),(date,name,sex),(date,name),(date),(date,sex),())
因為rollup后面有2列,有2*2-1=3種分類,cube后面有2列,有2^2=4種分類,當rollup和cube組合使用時,總共有3*4種分類。
having語法
having語法一般和group by結合使用,用來控制選擇分組
select name,sum(cast(id as double))as num
from
(
select '張三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '趙四'as name,'2021-04-16' as date,'200'as id
union all
select '張三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by name
having sum(cast(id as double))>210 -- ## 選擇sum(id)>210的數據行