OVER(PARTITION BY)函數用法

本文轉載自查看原文 2016-04-18 16:24 9821

OVER(PARTITION BY)函數介紹

開窗函數 Oracle從8.1.6開始提供分析函數，分析函數用於計算基於組的某種聚合值，它和聚合函數的不同之處是：對於每個組返回多行，而聚合函數對於每個組只返回一行。

開窗函數指定了分析函數工作的數據窗口大小，這個數據窗口大小可能會隨着行的變化而變化，舉例如下： 1：over后的寫法： over（order by salary）按照salary排序進行累計，order by是個默認的開窗函數 over（partition by deptno）按照部門分區

over（partition by deptno order by salary）

2：開窗的窗口范圍： over（order by salary range between 5 preceding and 5 following）：窗口范圍為當前行數據幅度減5加5后的范圍內的。

舉例：

--sum(s)over(order by s range between 2 preceding and 2 following) 表示加2或2的范圍內的求和

select name,class,s, sum(s)over(order by s range between 2 preceding and 2 following) mm from t2 adf 3 45 45 --45加2減2即43到47，但是s在這個范圍內只有45 asdf 3 55 55 cfe 2 74 74 3dd 3 78 158 --78在76到80范圍內有78，80，求和得158 fda 1 80 158 gds 2 92 92 ffd 1 95 190 dss 1 95 190 ddd 3 99 198

gf 3 99 198

over（order by salary rows between 5 preceding and 5 following）：窗口范圍為當前行前后各移動5行。

舉例：

--sum(s)over(order by s rows between 2 preceding and 2 following)表示在上下兩行之間的范圍內 select name,class,s, sum(s)over(order by s rows between 2 preceding and 2 following) mm from t2 adf 3 45 174 （45+55+74=174） asdf 3 55 252 （45+55+74+78=252） cfe 2 74 332 （74+55+45+78+80=332） 3dd 3 78 379 （78+74+55+80+92=379） fda 1 80 419 gds 2 92 440 ffd 1 95 461 dss 1 95 480 ddd 3 99 388 gf 3 99 293

over（order by salary range between unbounded preceding and unbounded following）或者

over（order by salary rows between unbounded preceding and unbounded following）：窗口不做限制

3、與over函數結合的幾個函數介紹

row_number()over()、rank()over()和dense_rank()over()函數的使用

下面以班級成績表t2來說明其應用

t2表信息如下： cfe        2        74 dss        1        95 ffd        1        95 fda        1        80 gds        2        92 gf         3        99 ddd        3        99 adf        3        45 asdf       3        55 3dd        3        78
select * from                                                                          (                                                                               select name,class,s,rank()over(partition by class order by s desc) mm from t2     )                                                                               where mm=1；得到的結果是: dss        1        95        1 ffd        1        95        1 gds        2        92        1 gf         3        99        1 ddd        3        99        1
注意：     1.在求第一名成績的時候，不能用row_number()，因為如果同班有兩個並列第一，row_number()只返回一個結果; select * from                                                                          (                                                                               select name,class,s,row_number()over(partition by class order by s desc) mm from t2     )                                                                               where mm=1； 1        95        1 --95有兩名但是只顯示一個 2        92        1 3        99        1 --99有兩名但也只顯示一個
    2.rank()和dense_rank()可以將所有的都查找出來：如上可以看到采用rank可以將並列第一名的都查找出來；      rank()和dense_rank()區別：      --rank()是跳躍排序，有兩個第二名時接下來就是第四名； select name,class,s,rank()over(partition by class order by s desc) mm from t2 dss        1        95        1 ffd        1        95        1 fda        1        80        3 --直接就跳到了第三 gds        2        92        1 cfe        2        74        2 gf         3        99        1 ddd        3        99        1 3dd        3        78        3 asdf       3        55        4 adf        3        45        5      --dense_rank()l是連續排序，有兩個第二名時仍然跟着第三名 select name,class,s,dense_rank()over(partition by class order by s desc) mm from t2 dss        1        95        1 ffd        1        95        1 fda        1        80        2 --連續排序（仍為2） gds        2        92        1 cfe        2        74        2 gf         3        99        1 ddd        3        99        1 3dd        3        78        2 asdf       3        55        3 adf        3        45        4
--sum()over（）的使用
select name,class,s, sum(s)over(partition by class order by s desc) mm from t2 --根據班級進行分數求和 dss        1        95        190 --由於兩個95都是第一名，所以累加時是兩個第一名的相加 ffd        1        95        190 fda        1        80        270 --第一名加上第二名的 gds        2        92        92 cfe        2        74        166 gf         3        99        198 ddd        3        99        198 3dd        3        78        276 asdf       3        55        331 adf        3        45        376

first_value() over()和last_value() over()的使用

--找出這三條電路每條電路的第一條記錄類型和最后一條記錄類型

SELECT opr_id,res_type, first_value(res_type) over(PARTITION BY opr_id ORDER BY res_type) low, last_value(res_type) over(PARTITION BY opr_id ORDER BY res_type rows BETWEEN unbounded preceding AND unbounded following) high FROM rm_circuit_route WHERE opr_id IN ('000100190000000000021311','000100190000000000021355','000100190000000000021339') ORDER BY opr_id;

注：rows BETWEEN unbounded preceding AND unbounded following 的使用

--取last_value時不使用rows BETWEEN unbounded preceding AND unbounded following的結果

SELECT opr_id,res_type, first_value(res_type) over(PARTITION BY opr_id ORDER BY res_type) low, last_value(res_type) over(PARTITION BY opr_id ORDER BY res_type) high FROM rm_circuit_route WHERE opr_id IN ('000100190000000000021311','000100190000000000021355','000100190000000000021339') ORDER BY opr_id;

如下圖可以看到，如果不使用

rows BETWEEN unbounded preceding AND unbounded following，取出的last_value由於與res_type進行進行排列，因此取出的電路的最后一行記錄的類型就不是按照電路的范圍提取了，而是以res_type為范圍進行提取了。

在first_value和last_value中ignore nulls的使用

數據如下：

取出該電路的第一條記錄，加上ignore nulls后，如果第一條是判斷的那個字段是空的，則默認取下一條，結果如下所示：

--lag() over()函數用法（取出前n行數據） lag(expresstion,<offset>,<default>) with a as (select 1 id,'a' name from dual union select 2 id,'b' name from dual union select 3 id,'c' name from dual union select 4 id,'d' name from dual union select 5 id,'e' name from dual ) select id,name, lag(id,1,'')over(order by name) from a;
--lead() over()函數用法（取出后N行數據）

lead(expresstion,<offset>,<default>) with a as (select 1 id,'a' name from dual union select 2 id,'b' name from dual union select 3 id,'c' name from dual union select 4 id,'d' name from dual union select 5 id,'e' name from dual ) select id,name,lead(id,1,'')over(order by name) from a;
--ratio_to_report(a)函數用法 Ratio_to_report() 括號中就是分子，over() 括號中就是分母
with a as (select 1 a from dual            union all select 1 a from dual            union all select 1 a from dual            union all select 2 a from dual            union all select 3 a from dual            union all select 4 a from dual            union all select 4 a from dual            union all select 5 a from dual            ) select a, ratio_to_report(a)over(partition by a) b from a order by a;
with a as (select 1 a from dual            union all select 1 a from dual            union all select 1 a from dual            union all select 2 a from dual            union all select 3 a from dual            union all select 4 a from dual            union all select 4 a from dual            union all select 5 a from dual            ) select a, ratio_to_report(a)over() b from a --分母缺省就是整個占比 order by a;
with a as (select 1 a from dual            union all select 1 a from dual            union all select 1 a from dual            union all select 2 a from dual            union all select 3 a from dual            union all select 4 a from dual            union all select 4 a from dual            union all select 5 a from dual            ) select a, ratio_to_report(a)over() b from a group by a order by a;--分組后的占比

percent_rank用法

計算方法：所在組排名序號-1除以該組所有的行數-1，如下所示自己計算的pr1與通過percent_rank函數得到的值是一樣的： SELECT a.deptno, a.ename, a.sal, a.r, b.n, (a.r-1)/(n-1) pr1, percent_rank() over(PARTITION BY a.deptno ORDER BY a.sal) pr2 FROM (SELECT deptno, ename, sal, rank() over(PARTITION BY deptno ORDER BY sal) r --計算出在組中的排名序號 FROM emp ORDER BY deptno, sal) a, (SELECT deptno, COUNT(1) n FROM emp GROUP BY deptno) b --按部門計算每個部門的所有成員數 WHERE a.deptno = b.deptno;

cume_dist函數

計算方法：所在組排名序號除以該組所有的行數，但是如果存在並列情況，則需加上並列的個數-1，如下所示自己計算的pr1與通過percent_rank函數得到的值是一樣的： SELECT a.deptno, a.ename, a.sal, a.r, b.n, c.rn, (a.r + c.rn - 1) / n pr1, cume_dist() over(PARTITION BY a.deptno ORDER BY a.sal) pr2 FROM (SELECT deptno, ename, sal, rank() over(PARTITION BY deptno ORDER BY sal) r FROM emp ORDER BY deptno, sal) a, (SELECT deptno, COUNT(1) n FROM emp GROUP BY deptno) b, (SELECT deptno, r, COUNT(1) rn,sal FROM (SELECT deptno,sal, rank() over(PARTITION BY deptno ORDER BY sal) r FROM emp) GROUP BY deptno, r,sal ORDER BY deptno) c --c表就是為了得到每個部門員工工資的一樣的個數 WHERE a.deptno = b.deptno AND a.deptno = c.deptno(+) AND a.sal = c.sal;

percentile_cont函數

含義：輸入一個百分比（該百分比就是按照percent_rank函數計算的值），返回該百分比位置的平均值如下，輸入百分比為0.7，因為0.7介於0.6和0.8之間，因此返回的結果就是0.6對應的sal的1500加上0.8對應的sal的1600平均 SELECT ename, sal, deptno, percentile_cont(0.7) within GROUP(ORDER BY sal) over(PARTITION BY deptno) "Percentile_Cont", percent_rank() over(PARTITION BY deptno ORDER BY sal) "Percent_Rank" FROM emp WHERE deptno IN (30, 60);

若輸入的百分比為0.6，則直接0.6對應的sal值，即1500 SELECT ename, sal, deptno, percentile_cont(0.6) within GROUP(ORDER BY sal) over(PARTITION BY deptno) "Percentile_Cont", percent_rank() over(PARTITION BY deptno ORDER BY sal) "Percent_Rank" FROM emp WHERE deptno IN (30, 60);

PERCENTILE_DISC函數

功能描述：返回一個與輸入的分布百分比值相對應的數據值，分布百分比的計算方法見函數CUME_DIST，如果沒有正好對應的數據值，就取大於該分布值的下一個值。注意：本函數與PERCENTILE_CONT的區別在找不到對應的分布值時返回的替代值的計算方法不同
SAMPLE：下例中0.7的分布值在部門30中沒有對應的Cume_Dist值，所以就取下一個分布值0.83333333所對應的SALARY來替代
SELECT ename, sal, deptno, percentile_disc(0.7) within GROUP(ORDER BY sal) over(PARTITION BY deptno) "Percentile_Disc", cume_dist() over(PARTITION BY deptno ORDER BY sal) "Cume_Dist" FROM emp WHERE deptno IN (30, 60);

詳細參考http://www.cnblogs.com/lanzi/archive/2010/10/26/1861338.html

--select max(t.check_date),t.user_id from attendance t group by t.user_id;
insert into attendance_day
(id, user_id, check_day, gps_x, gpx_y)
select seq_attendance.nextval, t.user_id, t.check_date, t.gps_x, t.gpx_y
--from attendance t
    from (select m.*,
                 row_number() over(partition by m.user_id order by m.check_date desc) rn
            from attendance m) t
   where rn = 1;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 OVER(PARTITION BY)函數介紹 Hive 窗口函數之 lead() over(partition by ) 和 lag() over(partition by ) row_number() over (partition by order by)的用法關於rank() over(partition by A order by B)的用法 oracle 高級用法, decode 排序, over(partition by x order by y) 開窗函數 sum() over( partition by order by ) oracle中partition by分組排序函數用法 ROW_NUMBER() OVER(PARTITION BY) hive 分組排序函數 row_number() over(partition by " " order by " "desc