這篇文章繼續介紹4個窗口函數。
lag
lag(column_name,n,default):用於統計窗口內往上第n行的值,第一個參數為列名,第二個參數為往上第n行(可選,默認為1),第三個參數為默認值(當往上n行為null時,取默認值,若不指定,則為null)
lead
lead與lag想法,lead(column_name,n,default)用於統計窗口內向下取n行的值
first_value
first_value() 取分組排序后,截止到當前行的第一個值
last_value
last_value() 取分組排序后,截止到當前行的最后一個值
下面通過具體的實例介紹它們的用法
create table if not exists buy_info ( name string, buy_date string, buy_num int ) row format delimited fields terminated by '|'; select * from buy_info;
name | buy_date | buy_num |
zhangsan | 2020-02-23 | 21 |
zhangsan | 2020-03-12 | 34 |
zhangsan | 2020-04-15 | 12 |
zhangsan | 2020-05-12 | 51 |
lisi | 2020-03-16 | 12 |
lisi | 2020-03-21 | 24 |
lisi | 2020-07-12 | 41 |
lisi | 2020-07-27 | 32 |
select name , buy_date,buy_num,
lag(buy_date,1,'1970-01-01') over(partition by name order by buy_date) as last_date, lead(buy_date,1,'2020-12-31') over(partition by name order by buy_date) as next_date, first_value () over(partition by name order by buy_date) as first_date, last_value() over(partition by name order by buy_date) as last_date from buy_info;
查詢結果如下
name | buy_date | buy_num | last_date | next_date | first_date | last_date |
zhangsan | 2020-02-23 | 21 | 1970-01-01 | 2020-03-12 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-03-12 | 34 | 2020-02-23 | 2020-04-15 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-04-15 | 12 | 2020-03-12 | 2020-05-12 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-05-12 | 51 | 2020-04-15 | 2020-12-31 | 2020-02-23 | 2020-05-12 |
lisi | 2020-03-16 | 12 | 1970-01-01 | 2020-03-21 | 2020-03-16 | 2020-07-27 |
lisi | 2020-03-21 | 24 | 2020-03-16 | 2020-07-12 | 2020-03-16 | 2020-07-27 |
lisi | 2020-07-12 | 41 | 2020-03-21 | 2020-07-27 | 2020-03-16 | 2020-07-27 |
lisi | 2020-07-27 | 32 | 2020-07-12 | 2020-12-31 | 2020-03-16 | 2020-07-27 |