Hive 窗口函數詳細介紹3 lag,lead,first_value,last_value


這篇文章繼續介紹4個窗口函數。

lag

lag(column_name,n,default):用於統計窗口內往上第n行的值,第一個參數為列名,第二個參數為往上第n行(可選,默認為1),第三個參數為默認值(當往上n行為null時,取默認值,若不指定,則為null)

lead

lead與lag想法,lead(column_name,n,default)用於統計窗口內向下取n行的值

first_value

first_value()  取分組排序后,截止到當前行的第一個值

last_value

last_value() 取分組排序后,截止到當前行的最后一個值

 

下面通過具體的實例介紹它們的用法

create  table  if   not    exists   buy_info  (
    
 name   string,
 buy_date   string,
 buy_num   int
)
row  format  delimited  fields   terminated  by  '|';


select  *  from   buy_info;
name buy_date buy_num
zhangsan 2020-02-23 21
zhangsan 2020-03-12 34
zhangsan 2020-04-15 12
zhangsan 2020-05-12 51
lisi 2020-03-16 12
lisi 2020-03-21 24
lisi 2020-07-12 41
lisi 2020-07-27

32

 

 

 

 

 

 

 

 

 

 

 

 
         
select   name , buy_date,buy_num,
lag(buy_date,1,'1970-01-01')  over(partition  by  name  order  by  buy_date)   as last_date, lead(buy_date,1,'2020-12-31') over(partition by name order by buy_date) as next_date, first_value () over(partition by name order by buy_date) as first_date, last_value() over(partition by name order by buy_date) as last_date from buy_info; 
 

 

查詢結果如下

name buy_date buy_num last_date next_date first_date last_date
zhangsan 2020-02-23 21 1970-01-01 2020-03-12 2020-02-23 2020-05-12
zhangsan 2020-03-12 34 2020-02-23 2020-04-15 2020-02-23 2020-05-12
zhangsan 2020-04-15 12 2020-03-12 2020-05-12 2020-02-23 2020-05-12
zhangsan 2020-05-12 51 2020-04-15 2020-12-31 2020-02-23 2020-05-12
lisi 2020-03-16 12 1970-01-01 2020-03-21 2020-03-16 2020-07-27
lisi 2020-03-21 24 2020-03-16 2020-07-12 2020-03-16 2020-07-27
lisi 2020-07-12 41 2020-03-21 2020-07-27 2020-03-16 2020-07-27
lisi 2020-07-27 32 2020-07-12 2020-12-31 2020-03-16 2020-07-27


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM