1、ROW_NUMBER
- 命令格式
row_number() over(partition by [col1, col2…] order by [col1[asc|desc], col2[asc|desc]…])
- 命令說明
該函數用於計算行號,從1開始。
- 參數說明
partition by [col1, col2..]:指定開窗口的列。order by col1[asc|desc], col2[asc|desc]:指定結果返回時的排序的值。
- 返回值說明
返回BIGINT類型。
- 用於去重
SELECT * FROM ( SELECT * , ROW_NUMBER() OVER (PARTITION BY t.去重字段 ORDER BY t.去重字段 DESC) AS rn FROM xxx t ) p WHERE p.rn = 1;
2、LAG
- 命令格式
lag(expr,Bigint offset, default) over(partition by [col1, col2…] [order by [col1[asc|desc], col2[asc|desc]…]])
- 命令說明
按偏移量取當前行之前第幾行的值。如果當前行號為
rn,則取行號為rn-offset的值。
LAG()窗口函數返回分區中當前行之前行(可以指定第幾行)的值。 如果沒有行,則返回null。
- 參數說明
expr:任意類型。offset:BIGINT類型常量。輸入值為STRING、DOUBLE到BIGINT的隱式轉換,offset>0。default:當offset指定的范圍越界時的缺省值,常量,默認值為NULL。partition by [col1, col2..]:指定開窗口的列。order by col1[asc|desc], col2[asc|desc]:指定返回結果的排序方式。
- 返回值說明
返回值類型同
expr類型。
3、LEAD
- 命令格式
lead(expr,Bigint offset, default) over(partition by [col1, col2…] [order by [col1[asc|desc], col2[asc|desc]…]])
- 命令說明
按偏移量取當前行之后第幾行的值。如果當前行號為
rn,則取行號為rn+offset的值。
LEAD()窗口函數返回分區中當前行后面行(可以指定第幾行)的值。 如果沒有行,則返回null。
- 參數說明
expr:任意類型。offset:可選,BIGINT類型常量。輸入值為STRING、DECIMAL、DOUBLE到BIGINT的隱式轉換,offset>0。default:可選,當offset指定的范圍越界時的缺省值,常量。partition by [col1, col2..]:指定開窗口的列。order by col1[asc|desc], col2[asc|desc]:指定返回結果的排序方式。
- 返回值說明
返回值類型同
expr類型。
window子句:
- PRECEDING:往前
- FOLLOWING:往后
- CURRENT ROW:當前行
- UNBOUNDED:起點,UNBOUNDED PRECEDING 表示從前面的起點, UNBOUNDED FOLLOWING:表示到后面的終點
select name,orderdate,cost, sum(cost) over() as sample1,--所有行相加 sum(cost) over(partition by name) as sample2,--按name分組,組內數據相加 sum(cost) over(partition by name order by orderdate) as sample3,--按name分組,組內數據累加 sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一樣,由起點到當前行的聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --當前行和前面一行做聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--當前行和前邊一行及后面一行 sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --當前行及后面所有行 from t_window;
應用場景
一、聚合
1、店鋪19年總銷量(sum)
sum是求和,over()沒有參數,則對所有數據進行求和,輸出的結果都是5800
select a.*, sum(sale)over() as total_sale from sale_detial a


2、店鋪19年每個level總銷量(sum)
用level分組求和,則每個level的輸出結果一致
select a.*, sum(sale)over() as total_sale, sum(sale)over(partition by level) as level_sale from sale_detial a


3、店鋪19年每個level按城市銷量降序累加求和銷量(sum)
當使用order by時,沒有rows between則意味着窗口是從起始行到當前行,所以對不同level進行累加求和
select a.*, sum(sale)over() as total_sale, sum(sale)over(partition by level) as level_sale, sum(sale)over(partition by level order by sale desc) as level_cum_sale from sale_detial a


4、店鋪19年總銷售城市數量、每個level城市數量(count)
count()是計數,可以用count(distinct city)進行去重,如果partition by進行分組,則分組后計數
select a.*, count(city)over() as total_city, count(city)over(partition by level) as level_city from sale_detial a


5、 店鋪19年平均每個城市銷量,各level平均銷量(avg)
avg用法與sum基本一致
select a.*, avg(sale)over() as avg_sale, avg(sale)over(partition by level) as level_avg_sale from sale_detial a


6、店鋪19年按城市銷量降序后,截止當前平均、移動平均(avg)
當用rows between指定窗口后可以計算移動平均
select a.*, avg(sale)over(order by sale desc) as avg_sale, avg(sale)over(order by sale desc rows between 1 preceding and 1 following) as avg_sale_1 from sale_detial a


7、 店鋪19年城市最高銷量,各level最低銷量(max/min)
max/min用法與sum一致
select a.*, max(sale)over() as max_sale, min(sale)over(partition by level) as level_min_sale from sale_detial a


二、排序
1、店鋪19年各城市按銷量排序(row_number,rank,dense_rank)
row_number——從1開始,按照順序,生成分組內記錄的序列
rank——成數據項在分組中的排名,排名相等會在名次中留下空位。
dense_rank——生成數據項在分組中的排名,排名相等會在名詞中不會留下空位
select a.*, row_number()over(order by sale desc) as row_number, rank()over(order by sale desc) as rank, dense_rank()over(order by sale desc) as dense_rank from sale_detial a


三、極值
1、店鋪19年銷量最高最低城市,各level銷量最低城市(first_value,last_value)
first_value,按分組排序后,取范圍內第1個值,last_value,取最后1個值
因為默認窗口的關系,last_value會隨着窗口的改變而改變,所以一般不用last_value,如果要用,則改變窗口為所有行
select a.*, first_value(city)over(order by sale desc) as max_city, first_value(city)over(order by sale asc) as min_city, last_value(city)over(order by sale desc) as min_city_1, last_value(city)over(partition by level order by sale desc rows between unbounded preceding and unbounded following) as level_min_city from sale_detial a


四、移動
1、店鋪19年按level分組后各城市銷量前1位和后1位的城市(lag,lead)
lag/lead是按照排序規則,取前多少位和后多少位,參數有3個,第1個是要取出來的列,第2個移動多少位,第3個是如果取不到,賦予的值,默認取不到是NULL
select a.*, lag(city,1,null)over(partition by level order by sale desc) as lag_city, lead(city,1,'0')over(partition by level order by sale desc) as lead_city from sale_detial a


五、切片
1、店鋪19年按銷量切片、各level按銷量切片(ntile)
ntile(n),用於將分組數據按照順序切分成N片,返回當前切片值。ntile不支持rows between,如果切片不均勻,默認增加第一個切片的分布。
ntile這個很強大,以前要獲取一定比例的數據是非常困難的,ntile就是把有序分區中的行分發到指定數據的組中,各個組有編號,編號從1開始,對於每一行,ntile返回此行所屬的組的編號
select a.*, ntile(3) over(order by sale desc) as total_part, ntile(2)over(partition by level order by sale desc) as level_part from sale_detial a


本文摘自:https://blog.csdn.net/cindy407/java/article/details/105394672
