窗口函數使用舉例


窗口函數對於一些統計應用情況有非常好的使用效果,本文主要舉例使用常用的幾種窗口函數。

定義

  A window function performs a calculation across a set of table rows that are somehow related to the current row.

數據准備

 

CREATE TABLE empsalary(
  depname varchar,
  empno bigint,
  salary int,
  enroll_date date
);
INSERT INTO empsalary VALUES('develop',10, 5200, '2007/08/01');
INSERT INTO empsalary VALUES('sales', 1, 5000, '2006/10/01');
INSERT INTO empsalary VALUES('personnel', 5, 3500, '2007/12/10');
INSERT INTO empsalary VALUES('sales', 4, 4800, '2007/08/08');
INSERT INTO empsalary VALUES('sales', 6, 5500, '2007/01/02');
INSERT INTO empsalary VALUES('personnel', 2, 3900, '2006/12/23');
INSERT INTO empsalary VALUES('develop', 7, 4200, '2008/01/01');
INSERT INTO empsalary VALUES('develop', 9, 4500, '2008/01/01');
INSERT INTO empsalary VALUES('sales', 3, 4800, '2007/08/01');
INSERT INTO empsalary VALUES('develop', 8, 6000, '2006/10/01');
INSERT INTO empsalary VALUES('develop', 11, 5200, '2007/08/15');

 

操作

1.row_number():返回行號,對比值重復時行號不重復不間斷, 即返回 1,2,3,4,5....,不返回 1,2,2,4...

 

select row_number() over(),* from empsalary limit 2;

 

 

 

select row_number() over(),* from empsalary limit 2 offset 2;

 

--按depname分組,salary排序,注意紅色記錄行號不間斷
select row_number() over(partition by depname order by salary),* from empsalary;

 

2.rank():返回行號,對比值重復時行號重復並間斷, 即返回 1,2,2,4...

 

select rank() over(partition by depname order by salary),* from empsalary;

 

3.dense_rank():返回行號,對比值重復時行號重復但不間斷, 即返回 1,2,2,3

select dense_rank() over(partition by depname order by salary),* from empsalary;

 

 

4.percent_rank():從當前開始, 計算在分組中的比例 (行號-1)*(1/(總記錄數-1))

 

select percent_rank() over(partition by depname order by salary),* from empsalary;

 

 

5.cume_dist() :返回行數除以記錄數值

select ROUND((cume_dist() over(partition by depname order by salary))::NUMERIC,2) AS cume_dist,* from empsalary;

 

6.ntile(分組數量):讓所有記錄按分組數目盡可以的均勻分布

 

select ntile(3) over(partition by depname order by salary),* from empsalary;

 

 

7.lag(value any [, offset integer [, default any ]]):返回偏移量值, offset integer 是偏移值, 正數時前值, 負數時后值, 沒有取到值時用 default 代替;所謂正偏移即當前行的值salary按照偏移量offset偏移到下面對應的行,負偏移同。見紅色標識

select lag(salary,1,null) over(partition by depname order by enroll_date),* from empsalary;

 

 

8.lead(value any [, offset integer [, default any ]]):返回偏移量值, offset integer 是偏移值, 正數時取后值,負數時取前值, 沒有取到值時用 default 代替。

select lead(salary,1,2) over(partition by depname order by enroll_date),* from empsalary;

 

 

9.first_value(value any)返回第一值

 

select first_value(salary) over(partition by depname order by enroll_date),* from empsalary;

 

 

10.last_value(value any)返回最后值

select last_value(salary) over(partition by depname order by enroll_date),* from empsalary;

 

從返回結果看,似乎有問題,默認情況下, 帶了 order by 參數會從分組的起始值開始一直疊加, 直到當前值不同為止,通過修改分組的統計范圍就可以實現 order by 參數取最后值.

select last_value(salary) over(partition by depname order by enroll_date range between unbounded preceding and unbounded following),* FROM empsalary;

 

 11.nth_value(value any, nth integer):返回窗口框架中的指定值,如nth_value(salary,2),則表示返回字段salary的第二個窗口函數值

 

select nth_value(salary,2) over(partition by depname order by enroll_date),* from empsalary;

 

 

12.同時調用多個窗口函數可用下面別名簡化寫法

 

select sum(salary) over w,avg(salary) over w,* from empsalary window w as (partition by depname order by enroll_date);
與下面寫法同:
select sum(salary) over(partition by depname order by enroll_date),avg(salary) over(partition by depname order by enroll_date),* from empsalary;

 

 

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM