窗口函數對於一些統計應用情況有非常好的使用效果,本文主要舉例使用常用的幾種窗口函數。
定義
數據准備
CREATE TABLE empsalary( depname varchar, empno bigint, salary int, enroll_date date ); INSERT INTO empsalary VALUES('develop',10, 5200, '2007/08/01'); INSERT INTO empsalary VALUES('sales', 1, 5000, '2006/10/01'); INSERT INTO empsalary VALUES('personnel', 5, 3500, '2007/12/10'); INSERT INTO empsalary VALUES('sales', 4, 4800, '2007/08/08'); INSERT INTO empsalary VALUES('sales', 6, 5500, '2007/01/02'); INSERT INTO empsalary VALUES('personnel', 2, 3900, '2006/12/23'); INSERT INTO empsalary VALUES('develop', 7, 4200, '2008/01/01'); INSERT INTO empsalary VALUES('develop', 9, 4500, '2008/01/01'); INSERT INTO empsalary VALUES('sales', 3, 4800, '2007/08/01'); INSERT INTO empsalary VALUES('develop', 8, 6000, '2006/10/01'); INSERT INTO empsalary VALUES('develop', 11, 5200, '2007/08/15');
操作
1.row_number():返回行號,對比值重復時行號不重復不間斷, 即返回 1,2,3,4,5....,不返回 1,2,2,4...
select row_number() over(),* from empsalary limit 2;

select row_number() over(),* from empsalary limit 2 offset 2;

--按depname分組,salary排序,注意紅色記錄行號不間斷 select row_number() over(partition by depname order by salary),* from empsalary;

2.rank():返回行號,對比值重復時行號重復並間斷, 即返回 1,2,2,4...
select rank() over(partition by depname order by salary),* from empsalary;

3.dense_rank():返回行號,對比值重復時行號重復但不間斷, 即返回 1,2,2,3
select dense_rank() over(partition by depname order by salary),* from empsalary;

4.percent_rank():從當前開始, 計算在分組中的比例 (行號-1)*(1/(總記錄數-1))
select percent_rank() over(partition by depname order by salary),* from empsalary;

5.cume_dist() :返回行數除以記錄數值
select ROUND((cume_dist() over(partition by depname order by salary))::NUMERIC,2) AS cume_dist,* from empsalary;

6.ntile(分組數量):讓所有記錄按分組數目盡可以的均勻分布
select ntile(3) over(partition by depname order by salary),* from empsalary;

7.lag(value any [, offset integer [, default any ]]):返回偏移量值, offset integer 是偏移值, 正數時前值, 負數時后值, 沒有取到值時用 default 代替;所謂正偏移即當前行的值salary按照偏移量offset偏移到下面對應的行,負偏移同。見紅色標識
select lag(salary,1,null) over(partition by depname order by enroll_date),* from empsalary;

8.lead(value any [, offset integer [, default any ]]):返回偏移量值, offset integer 是偏移值, 正數時取后值,負數時取前值, 沒有取到值時用 default 代替。
select lead(salary,1,2) over(partition by depname order by enroll_date),* from empsalary;

9.first_value(value any)返回第一值
select first_value(salary) over(partition by depname order by enroll_date),* from empsalary;

10.last_value(value any)返回最后值
select last_value(salary) over(partition by depname order by enroll_date),* from empsalary;

從返回結果看,似乎有問題,默認情況下, 帶了 order by 參數會從分組的起始值開始一直疊加, 直到當前值不同為止,通過修改分組的統計范圍就可以實現 order by 參數取最后值.
select last_value(salary) over(partition by depname order by enroll_date range between unbounded preceding and unbounded following),* FROM empsalary;

11.nth_value(value any, nth integer):返回窗口框架中的指定值,如nth_value(salary,2),則表示返回字段salary的第二個窗口函數值
select nth_value(salary,2) over(partition by depname order by enroll_date),* from empsalary;

12.同時調用多個窗口函數可用下面別名簡化寫法
select sum(salary) over w,avg(salary) over w,* from empsalary window w as (partition by depname order by enroll_date); 與下面寫法同: select sum(salary) over(partition by depname order by enroll_date),avg(salary) over(partition by depname order by enroll_date),* from empsalary;

