一、總體介紹 分析函數如何工作 語法 FUNCTION_NAME(<參數>,…) OVER (<PARTITION BY 表達式,…> <ORDER BY 表達式 <ASC DESC> <NULLS FIRST NULLS LAST>> <WINDOWING子句>) PARTITION子句 ORDER BY子句 WINDOWING子句 缺省時相當於RANGE UNBOUNDED PRECEDING
1. 值域窗(RANGE WINDOW) RANGE N PRECEDING 僅對數值或日期類型有效,選定窗為排序后當前行之前,某列(即排序列)值大於/小於(當前行該列值 –/+ N)的所有行,因此與ORDER BY子句有關系。
2. 行窗(ROW WINDOW) ROWS N PRECEDING 選定窗為當前行及之前N行。 還可以加上BETWEEN AND 形式,例如RANGE BETWEEN m PRECEDING AND n FOLLOWING 函數 AVG(<distinct all> eXPr) 一組或選定窗中表達式的平均值 CORR(expr, expr) 即COVAR_POP(exp1,exp2) / (STDDEV_POP(expr1) * STDDEV_POP(expr2)),兩個表達式的互相關,-1(反相關) ~ 1(正相關),0表示不相關 COUNT(<distinct> <*> <expr>) 計數 COVAR_POP(expr, expr) 總體協方差 COVAR_SAMP(expr, expr) 樣本協方差 CUME_DIST 累積分布,即行在組中的相對位置,返回0 ~ 1 DENSE_RANK 行的相對排序(與ORDER BY搭配),相同的值具有一樣的序數(NULL計為相同),並不留空序數 FIRST_VALUE 一個組的第一個值 LAG(expr, <offset>, <default>) 訪問之前的行,OFFSET是缺省為1 的正數,表示相對行數,DEFAULT是當超出選定窗范圍時的返回值(如第一行不存在之前行) LAST_VALUE 一個組的最后一個值 LEAD(expr, <offset>, <default>) 訪問之后的行,OFFSET是缺省為1 的正數,表示相對行數,DEFAULT是當超出選定窗范圍時的返回值(如最后行不存在之前行) MAX(expr) 最大值 MIN(expr) 最小值 NTILE(expr) 按表達式的值和行在組中的位置編號,如表達式為4,則組分4份,分別為1 ~ 4的值,而不能等分則多出的部分在值最小的那組 PERCENT_RANK 類似CUME_DIST,1/(行的序數 - 1) RANK 相對序數,答應並列,並空出隨后序號 RATIO_TO_REPORT(expr) 表達式值 / SUM(表達式值) ROW_NUMBER 排序的組中行的偏移 STDDEV(expr) 標准差 STDDEV_POP(expr) 總體標准差 STDDEV_SAMP(expr) 樣本標准差 SUM(expr) 合計 VAR_POP(expr) 總體方差 VAR_SAMP(expr) 樣本方差 VARIANCE(expr) 方差 REGR_ xxxx(expr, expr) 線性回歸函數 REGR_SLOPE:返回斜率,等於COVAR_POP(expr1, expr2) / VAR_POP(expr2) REGR_INTERCEPT:返回回歸線的y截距,等於 AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2) REGR_COUNT:返回用於填充回歸線的非空數字對的數目 REGR_R2:返回回歸線的決定系數,計算式為: If VAR_POP(expr2) = 0 then return NULL If VAR_POP(expr1) = 0 and VAR_POP(expr2) != 0 then return 1 If VAR_POP(expr1) > 0 and VAR_POP(expr2 != 0 then return POWER(CORR(expr1,expr),2) REGR_AVGX:計算回歸線的自變量(expr2)的平均值,去掉了空對(expr1, expr2)后,等於AVG(expr2) REGR_AVGY:計算回歸線的應變量(expr1)的平均值,去掉了空對(expr1, expr2)后,等於AVG(expr1) REGR_SXX: 返回值等於REGR_COUNT(expr1, expr2) * VAR_POP(expr2) REGR_SYY: 返回值等於REGR_COUNT(expr1, expr2) * VAR_POP(expr1) REGR_SXY: 返回值等於REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2) 首先:創建表及接入測試數據

1 create table students 2 (id number(15,0), 3 area varchar2(10), 4 stu_type varchar2(2), 5 score number(20,2)); 6 insert into students values(1, '111', 'g', 80 ); 7 insert into students values(1, '111', 'j', 80 ); 8 insert into students values(1, '222', 'g', 89 ); 9 insert into students values(1, '222', 'g', 68 ); 10 insert into students values(2, '111', 'g', 80 ); 11 insert into students values(2, '111', 'j', 70 ); 12 insert into students values(2, '222', 'g', 60 ); 13 insert into students values(2, '222', 'j', 65 ); 14 insert into students values(3, '111', 'g', 75 ); 15 insert into students values(3, '111', 'j', 58 ); 16 insert into students values(3, '222', 'g', 58 ); 17 insert into students values(3, '222', 'j', 90 ); 18 insert into students values(4, '111', 'g', 89 ); 19 insert into students values(4, '111', 'j', 90 ); 20 insert into students values(4, '222', 'g', 90 ); 21 insert into students values(4, '222', 'j', 89 ); 22 commit;
二、具體應用: 1、分組求和: 1)GROUP BY子句

1 --A、GROUPING SETS 2 3 select id,area,stu_type,sum(score) score 4 from students 5 group by grouping sets((id,area,stu_type),(id,area),id) 6 order by id,area,stu_type; 7 8 /*--------理解grouping sets 9 select a, b, c, sum( d ) from t 10 group by grouping sets ( a, b, c ) 11 12 等效於 13 14 select * from ( 15 select a, null, null, sum( d ) from t group by a 16 union all 17 select null, b, null, sum( d ) from t group by b 18 union all 19 select null, null, c, sum( d ) from t group by c 20 ) 21 */ 22 23 --B、ROLLUP 24 25 select id,area,stu_type,sum(score) score 26 from students 27 group by rollup(id,area,stu_type) 28 order by id,area,stu_type; 29 30 /*--------理解rollup 31 select a, b, c, sum( d ) 32 from t 33 group by rollup(a, b, c); 34 35 等效於 36 37 select * from ( 38 select a, b, c, sum( d ) from t group by a, b, c 39 union all 40 select a, b, null, sum( d ) from t group by a, b 41 union all 42 select a, null, null, sum( d ) from t group by a 43 union all 44 select null, null, null, sum( d ) from t 45 ) 46 */ 47 48 --C、CUBE 49 50 select id,area,stu_type,sum(score) score 51 from students 52 group by cube(id,area,stu_type) 53 order by id,area,stu_type; 54 55 /*--------理解cube 56 select a, b, c, sum( d ) from t 57 group by cube( a, b, c) 58 59 等效於 60 61 select a, b, c, sum( d ) from t 62 group by grouping sets( 63 ( a, b, c ), 64 ( a, b ), ( a ), ( b, c ), 65 ( b ), ( a, c ), ( c ), 66 () ) 67 */ 68 69 --D、GROUPING 70 /*從上面的結果中我們很容易發現,每個統計數據所對應的行都會出現null, 71 如何來區分到底是根據那個字段做的匯總呢,grouping函數判斷是否合計列!*/ 72 73 select decode(grouping(id),1,'all id',id) id, 74 decode(grouping(area),1,'all area',to_char(area)) area, 75 decode(grouping(stu_type),1,'all_stu_type',stu_type) stu_type, 76 sum(score) score 77 from students 78 group by cube(id,area,stu_type) 79 order by id,area,stu_type;
二、OVER()函數的使用 1、統計名次——DENSE_RANK(),ROW_NUMBER() 1)允許並列名次、名次不間斷,DENSE_RANK(),結果如122344456…… 將score按ID分組排名:dense_rank() over(partition by id order by score desc) 將score不分組排名:dense_rank() over(order by score desc) select id,area,score, dense_rank() over(partition by id order by score desc) 分組id排序, dense_rank() over(order by score desc) 不分組排序 from students order by id,area; 2)不允許並列名次、相同值名次不重復,ROW_NUMBER(),結果如123456…… 將score按ID分組排名:row_number() over(partition by id order by score desc) 將score不分組排名:row_number() over(order by score desc) select id,area,score, row_number() over(partition by id order by score desc) 分組id排序, row_number() over(order by score desc) 不分組排序 from students order by id,area; 3)允許並列名次、復制名次自動空缺,rank(),結果如12245558…… 將score按ID分組排名:rank() over(partition by id order by score desc) 將score不分組排名:rank() over(order by score desc) select id,area,score, rank() over(partition by id order by score desc) 分組id排序, rank() over(order by score desc) 不分組排序 from students order by id,area; 4)名次分析,cume_dist()——-最大排名/總個數 函數:cume_dist() over(order by id) select id,area,score, cume_dist() over(order by id) a, --按ID最大排名/總個數 cume_dist() over(partition by id order by score desc) b, --ID分組中,scroe最大排名值/本組總個數 row_number() over (order by id) 記錄號 from students order by id,area; 5)利用cume_dist(),允許並列名次、復制名次自動空缺,取並列后較大名次,結果如22355778…… 將score按ID分組排名:cume_dist() over(partition by id order by score desc)*sum(1) over(partition by id) 將score不分組排名:cume_dist() over(order by score desc)*sum(1) over() select id,area,score, sum(1) over() as 總數, sum(1) over(partition by id) as 分組個數, (cume_dist() over(partition by id order by score desc))*(sum(1) over(partition by id)) 分組id排序, (cume_dist() over(order by score desc))*(sum(1) over()) 不分組排序 from students order by id,area 2、分組統計--sum(),max(),avg(),RATIO_TO_REPORT() select id,area, sum(1) over() as 總記錄數, sum(1) over(partition by id) as 分組記錄數, sum(score) over() as 總計 , sum(score) over(partition by id) as 分組求和, sum(score) over(order by id) as 分組連續求和, sum(score) over(partition by id,area) as 分組ID和area求和, sum(score) over(partition by id order by area) as 分組ID並連續按area求和, max(score) over() as 最大值, max(score) over(partition by id) as 分組最大值, max(score) over(order by id) as 分組連續最大值, max(score) over(partition by id,area) as 分組ID和area求最大值, max(score) over(partition by id order by area) as 分組ID並連續按area求最大值, avg(score) over() as 所有平均, avg(score) over(partition by id) as 分組平均, avg(score) over(order by id) as 分組連續平均, avg(score) over(partition by id,area) as 分組ID和area平均, avg(score) over(partition by id order by area) as 分組ID並連續按area平均, RATIO_TO_REPORT(score) over() as "占所有%", RATIO_TO_REPORT(score) over(partition by id) as "占分組%", score from students; 3、LAG(COL,n,default)、LEAD(OL,n,default) --取前后邊N條數據 取前面記錄的值:lag(score,n,x) over(order by id) 取后面記錄的值:lead(score,n,x) over(order by id) 參數:n表示移動N條記錄,X表示不存在時填充值,iD表示排序列 select id,lag(score,1,0) over(order by id) lg,score from students; select id,lead(score,1,0) over(order by id) lg,score from students; 4、FIRST_VALUE()、LAST_VALUE() 取第起始1行值:first_value(score,n) over(order by id) 取第最后1行值:LAST_value(score,n) over(order by id) select id,first_value(score) over(order by id) fv,score from students; select id,last_value(score) over(order by id) fv,score from students;