原始数据集如下:
group by语法
对数据集进行聚合操作,主要是通过group by子句实现,如下(按照name,对每个人的id进行汇总):
但是当希望在一段sql语法中,实现较复杂的聚合操作,则可以通过presto中的GROUPING SETS,CUBE和ROLLUP语法实现。
复杂的分组操作通常等同于所有简单表达式的并集。然而,这种等价不适用当数据源的聚集是非确定性的。
grouping sets语法
select date,name,sum(cast(id as double))as num
from
(
select '张三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '赵四'as name,'2021-04-16' as date,'200'as id
union all
select '张三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by grouping sets((date),(date,name),(name))
以上是通过grouping sets分别按照date、date和name、name对id进行sum聚合处理
cube语法
cube运算符为给定的列生成所有可能的分组集(即排列组合),2^n个组合,如cube(A,B),按照A,AB,B,()进行汇总
select date,name,sum(cast(id as double))as num
from
(
select '张三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '赵四'as name,'2021-04-16' as date,'200'as id
union all
select '张三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by cube(date,name) --## 等同于group by grouping sets((date),(date,name),(name),())
rollup语法
ROLLUP运算符对于一个给定的列生成所有可能的子分类汇总,2*n-1种分类,如rollup(A,B)按照A,AB,()进行汇总
select date,name,sum(cast(id as double))as num
from
(
select '张三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '赵四'as name,'2021-04-16' as date,'200'as id
union all
select '张三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by rollup(date,name) -- ## 注意顺序的区别group by rollup(name,date)
多个分组表达式结合使用
同一查询中的多个分组表达式被解释为具有跨产品语义,ALL 和 DISTINCT 关键字决定是否重复分组集每个产生不同的输出行,默认是all,即重复所有分组集
select date,name,sum(cast(id as double))as num
from
(
select '张三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '赵四'as name,'2021-04-16' as date,'200'as id
union all
select '张三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by distinct rollup(date,name),cube(date,name)
-- ## 等同于 group by grouping sets((date),(date,name),(name),())
select date,name,sex,sum(cast(id as double))as num
from
(
select '张三'as name,'2021-04-11' as date,'100'as id,'男'as sex
union all
select '李四'as name,'2021-04-09' as date,'100'as id,'男'as sex
union all
select '赵四'as name,'2021-04-16' as date,'200'as id,'女'as sex
union all
select '张三'as name,'2021-03-10'as date,'300'as id,'男'as sex
union all
select '李四'as name,'2020-01-01'as date,'150'as id,'男'as sex
)a
group by all rollup(date,name),cube(date,sex)
等同于
select date,name,sex,sum(cast(id as double))as num
from
(
select '张三'as name,'2021-04-11' as date,'100'as id,'男'as sex
union all
select '李四'as name,'2021-04-09' as date,'100'as id,'男'as sex
union all
select '赵四'as name,'2021-04-16' as date,'200'as id,'女'as sex
union all
select '张三'as name,'2021-03-10'as date,'300'as id,'男'as sex
union all
select '李四'as name,'2020-01-01'as date,'150'as id,'男'as sex
)a
group by grouping sets((date),(date,sex),(date),(date,name),(date,name,sex),(name),(date,name),(date,name,sex),(date,name),(date),(date,sex),())
因为rollup后面有2列,有2*2-1=3种分类,cube后面有2列,有2^2=4种分类,当rollup和cube组合使用时,总共有3*4种分类。
having语法
having语法一般和group by结合使用,用来控制选择分组
select name,sum(cast(id as double))as num
from
(
select '张三'as name,'2021-04-11' as date,'100'as id
union all
select '李四'as name,'2021-04-09' as date,'100'as id
union all
select '赵四'as name,'2021-04-16' as date,'200'as id
union all
select '张三'as name,'2021-03-10'as date,'300'as id
union all
select '李四'as name,'2020-01-01'as date,'150'as id
)a
group by name
having sum(cast(id as double))>210 -- ## 选择sum(id)>210的数据行