常用到的窗口函數
工作中要常對數據進行分析,分析前要對原始數據中找到想要的格式,數據原本存儲的格式不一定時我們想要的,要在基礎上進行一定的處理,下面介紹的幾種方式是常用的數據排序的集中方式,包含 排名函數(row_number())、排序函數(rank(),dense_rank())、聚合函數(常用統計函數)、偏移函數(lag(),lead(),first_value(),last_value())等內容
數據源為上篇文章的最后添加樣本數據,上篇文章的最后用到的幾個窗口函數會在這篇文章中詳細介紹
排名函數
Row_Number() :將數據行根據一定的規則進行排名,排列出1,2,3,4···的形式,此函數后必須跟over()
,且over()
必須指定排名列,以order by [列名]
形式,此排列順序一定是連續的,也可以添加分區,在不指定分區 partition by [列名]
時默認在查詢條件內排序,指定分區后在分區內排名
- 查找出所有用戶最近一次的賬單記錄
--顯然可以看出用一般的T-SQL語句 group by 是可以做到的
select actid,max(trandate) as trandate from transactions group by actid
--如果在增加其余的幾列顯然想過不是我們想要的結果了,因為在賬單中每個賬單號對於用戶來是唯一的,日期是賬單號的唯一,執行下面的語句會顯示出查出全部的內容
select actid,tranid,val,max(trandate) as trandate from transactions group by actid,tranid,val
--當然還有其他辦法,編寫比較復雜,這里就不介紹了 下面我們看一下 窗口排名函數 row_number()的做法
with c as(
select actid,tranid,val,trandate,
ROW_NUMBER() OVER(partition by actid order by trandate desc)
as rownum from transactions
)
select actid,tranid,val,trandate from c where rownum=1
- 每個賬號最近五次的消費記錄
顯然根據上面的查詢方法只需要修改 最后查詢后的where rownum<=5
- 每個賬號消費最多的五條記錄
首先要根據 actid
進行分區,然后根據 val
排序, 最后根據排序值 取出 rownum<=5
修改如下
with c as(
select actid,tranid,val,trandate,
ROW_NUMBER() OVER(partition by actid order by val desc)
as rownum from transactions
)
select actid,tranid,val,trandate from c where val=1
一般情況下相比於其他的窗口函數 row_number() 的使用率是最高的,使用場景頁多種多樣
比如:在SQL Server 2012之前沒引入 offset / fetch時我們經常用它來進行分頁工作,進行修改序列操作生成操作
例如上文中 虛擬表函數編寫,和修改訂單號讓數據化
- 分頁: 一般界面展示減少數據庫訪問壓力,會每次返回一定量的數據
declare
@pagesize int =150, --模擬每頁的顯示數量
@currpage int = 500; --第幾頁
--把所有數據當作數據源
with c as(
select actid,tranid,val,trandate,
row_number() over(order by (select null)) as rownum
from transactions
)
-- top查詢 和限制 rownum 值完成分頁效果
select top (@pagesize) actid,tranid,val,trandate
from c where rownum>(@currpage-1)*@pagesize and rownum<@currpage*@pagesize+1
排序函數
排序函數 和排名函數用法類似,生成結果上有所差異
rank() 非連續 如果排序列值不唯一時出現相同值,且下值會出現跳躍現象;排序列值唯一是效果與row_number()
函數一致
dense_rank() 連續排列,當列值不唯一時出現相同值,下值和上值會城現連續現象
rank()
select actid,tranid,val,trandate,
ROW_NUMBER() over(order by val) as rownum,
RANK() over(order by val) as rank
from transactions
order by val
offset 0 rows fetch first 1000 rows only;
dense_rank()
--dense_rank
select actid,tranid,val,trandate,
ROW_NUMBER() over(order by val) as rownum,
RANK() over(order by val) as rnk,
DENSE_RANK() over(order by val) as dense_rnk
from transactions
order by val
offset 0 rows fetch first 1000 rows only;
聚合、偏移函數
聚合函數
分區內逐條查找,遇見之后更新,
select * ,
max(val) over(partition by actid order by tranid) as max_val,
min(val) over(partition by actid order by tranid) as min_val,
sum(val) over(partition by actid order by tranid) as sum_val
from transactions
偏移函數
Lag() 前一條,未找到為null
Lead() 后一條,未找到默認null,可指定偏移量,和默認值
一個參數效果
select *,
LAG(val) over(partition by actid order by tranid,trandate) as pre_value,
LEAD (val) over(partition by actid order by tranid,trandate) as next_value
from transactions
可以看出偏移量,默認為1行,且未找到值為 null
兩個參數偏移函數,第一個參數偏移列,二個參數偏移行
select *,
LAG(val,3) over(partition by actid order by tranid,trandate) as pre_value,
LEAD (val,3) over(partition by actid order by tranid,trandate) as next_value
from transactions
指定默認值,將null
列默認值設置為0.00
select *,
LAG(val,3,0.00) over(partition by actid order by tranid,trandate) as pre_value,
LEAD (val,3,0.00) over(partition by actid order by tranid,trandate) as next_value
from transactions
first_value() 分區內第一個值
last_value() 分區內左后一個值
select *,
first_value(val) over(partition by actid order by tranid,trandate) as first_value,
last_value(val) over(partition by actid order by tranid,trandate
rows between current row and unbounded following
) as last_value
from transactions
數據透視
行變列方便操作
下面語句為查找出用戶流水最大的五條記錄編號,並變為列的形式
with c as(
select actid,tranid,
row_number() over (partition by actid order by val desc) as rownum
from transactions
)
select * from c
pivot(max(tranid)
for rownum in([1],[2],[3],[4],[5])
)as p
order by actid
字符串拼接
將上表中消費編號拼接為一列形式輸出
with c as(
select actid,tranid,
row_number() over (partition by actid order by val desc) as rownum
from transactions
)
select actid,concat([1],',',[2],',',[3],',',[4],',',[5]) as tranids
from c
pivot(max(tranid)
for rownum in([1],[2],[3],[4],[5])
)as p
order by actid