oracle數據倉庫中提供了很多非常實用的函數,一直以來接觸過,但是卻沒有實際的用到,因為盡管有示例,當時看懂了,但是后續馬上就忘了,今天岑敏強遇到了一個使用over的場景,於是一起探討了下,第一眼分析,我隱隱的就覺得一些分析函數應該會用到,但是沒想到是over。
需求:
在 t_zxxm_fy 這個表中,有三個字段 n_lxsbh、n_fyje、n_mqzt,其中一個n_lxsbh會對應多個 n_ysje,每個n_mqzt都不一樣,現在要統計幾個n_lxsbh的 n_fyje之和,但每個n_lxsbh的n_ysje只能取一條記錄的值,即n_ysje需要按n_mqzt進行某種排序取第一條記錄
這個問題直白點說,就是將符合條件的記錄按某n_lxsbh進行分組(如group by),然后對每個分組按n_mqzt進行排序,取分組的第一條記錄中的n_fyje值。顯然標准sql很難一條語句搞定,我嘗試了下,寫出來的sql很復雜,分組,嵌套,子查詢而且還沒完全搞定,如執行以下sql:
select n_lxsbh, n_fyje, n_mqzt from t_zxxm_fy f where n_fyje is not null --去掉n_fyje為空的值 and n_fyje <> 0 --去掉n_fyje為0的值 and n_lxsbh in (-999999999975037, -999999999937175, -999999999937891, -999999999937695, -999999999937289) --要統計的lxsbh order by n_lxsbh, n_mqzt desc
-----
N_LXSBH N_FYJE N_MQZT 1 -999999999975037 1680774.28 6(命中) 2 -999999999975037 966394.92 5 3 -999999999975037 1276123.64 5 4 -999999999975037 1119030.00 5 5 -999999999937891 123727.27 3(命中) 6 -999999999937695 2000.00 1(命中) 7 -999999999937289 66137.81 3(命中) 8 -999999999937175 81186.10 7(命中) 9 -999999999937175 23739.54 2 10 -999999999937175 96500.00 2 11 -999999999937175 8798789.00 2 12 -999999999937175 94200.00 2 13 -999999999937175 195914.00 2
也就是說,最后統計出的結果集如下:
N_LXSBH N_FYJE N_MQZT 1 -999999999975037 1680774.28 6(命中) 5 -999999999937891 123727.27 3(命中) 6 -999999999937695 2000.00 1(命中) 7 -999999999937289 66137.81 3(命中) 8 -999999999937175 81186.10 7(命中)
下面給出實現上述結果集的sql(但不是該業務最終需要sql,最終的sql見文后):
select * from (select t_zxxm_fy.n_lxsbh, t_zxxm_fy.n_fyje, row_number() over(partition by n_lxsbh order by n_mqzt desc) rn from (select n_lxsbh, n_fyje, n_mqzt from t_zxxm_fy f where n_lxsbh in (-999999999975037, -999999999937289, -999999999937695,-999999999937891, -999999999937175) order by n_lxsbh, n_mqzt desc) t_zxxm_fy) where rn = 1;
解釋:由於統計中不可避免進行分組,因此,首先一個子查詢:
(select n_lxsbh, n_fyje, n_mqzt from t_zxxm_fy f where n_lxsbh in (-999999999975037, -999999999937289, -999999999937695,-999999999937891, -999999999937175) order by n_lxsbh, n_mqzt desc) t_zxxm_fy)
也就是最內層的查詢將結果集給限制住,減少外層查詢需要處理的結果集,,然后外層查詢:
select t_zxxm_fy.n_lxsbh, t_zxxm_fy.n_fyje, row_number() over(partition by n_lxsbh order by n_mqzt desc) rn from(xxxxxxxx)
對該結果集進行over,簡單說,就是上面的結果集以n_mqzt 進行排序后再以n_lxsbh進行分組(和group by類似但是也有區別,group by沒組只能返回一條記錄,但是這個partition卻返回多條,僅僅是按某個字段分組而已,自己體會吧),分組之后的結果集狀態如第一個結果集所示,row_number函數會為分組中的每條記錄加一個行號,這個類似於rownum,如我將上面的rn=1限制去掉:
select * from (select t_zxxm_fy.n_lxsbh, t_zxxm_fy.n_fyje, row_number() over(partition by n_lxsbh order by n_mqzt desc) rn from (select n_lxsbh, n_fyje, n_mqzt from t_zxxm_fy f where n_lxsbh in (-999999999975037, -999999999937289, -999999999937695,-999999999937891, -999999999937175) order by n_lxsbh, n_mqzt desc) t_zxxm_fy)
---
N_LXSBH N_FYJE RN 1 -999999999975037 1680774.28 1 2 -999999999975037 966394.92 2 3 -999999999975037 1276123.64 3 4 -999999999975037 1119030.00 4 5 -999999999937891 123727.27 1 6 -999999999937695 2000.00 1 7 -999999999937289 66137.81 1 8 -999999999937175 81186.10 1 9 -999999999937175 195914.00 2 10 -999999999937175 94200.00 3 11 -999999999937175 8798789.00 4 12 -999999999937175 96500.00 5 13 -999999999937175 23739.54 6
上述每個分組都給加了一個行號rn,好像group by做不到的,如果取第一行的話rn=1即可。
這樣即完成了統計,功能是不是很強大?
下面就是性能了,最終的統計sql是這樣寫的:
select (select sum(n_fyje) from (select fy.n_lxsbh, first_value(fy.n_fyje) over(partition by fy.n_lxsbh order by decode(N_MQZT, 3, 1, 7, 2, 6, 3, 1, 4)) as n_fyje from t_zxxm_fy fy) where n_lxsbh in (select n_lxsbh from T_ZXXM_LXS t where t.n_mainxmbh = lxs.n_lxsbh)) as N_YGSSCB from T_ZXXM_LXS lxs where lxs.n_lxsbh = -999999999975037;