Hive 窗口函数详细介绍3 lag,lead,first_value,last_value


这篇文章继续介绍4个窗口函数。

lag

lag(column_name,n,default):用于统计窗口内往上第n行的值,第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上n行为null时,取默认值,若不指定,则为null)

lead

lead与lag想法,lead(column_name,n,default)用于统计窗口内向下取n行的值

first_value

first_value()  取分组排序后,截止到当前行的第一个值

last_value

last_value() 取分组排序后,截止到当前行的最后一个值

 

下面通过具体的实例介绍它们的用法

create  table  if   not    exists   buy_info  (
    
 name   string,
 buy_date   string,
 buy_num   int
)
row  format  delimited  fields   terminated  by  '|';


select  *  from   buy_info;
name buy_date buy_num
zhangsan 2020-02-23 21
zhangsan 2020-03-12 34
zhangsan 2020-04-15 12
zhangsan 2020-05-12 51
lisi 2020-03-16 12
lisi 2020-03-21 24
lisi 2020-07-12 41
lisi 2020-07-27

32

 

 

 

 

 

 

 

 

 

 

 

 
 
select   name , buy_date,buy_num,
lag(buy_date,1,'1970-01-01')  over(partition  by  name  order  by  buy_date)   as last_date, lead(buy_date,1,'2020-12-31') over(partition by name order by buy_date) as next_date, first_value () over(partition by name order by buy_date) as first_date, last_value() over(partition by name order by buy_date) as last_date from buy_info; 
 

 

查询结果如下

name buy_date buy_num last_date next_date first_date last_date
zhangsan 2020-02-23 21 1970-01-01 2020-03-12 2020-02-23 2020-05-12
zhangsan 2020-03-12 34 2020-02-23 2020-04-15 2020-02-23 2020-05-12
zhangsan 2020-04-15 12 2020-03-12 2020-05-12 2020-02-23 2020-05-12
zhangsan 2020-05-12 51 2020-04-15 2020-12-31 2020-02-23 2020-05-12
lisi 2020-03-16 12 1970-01-01 2020-03-21 2020-03-16 2020-07-27
lisi 2020-03-21 24 2020-03-16 2020-07-12 2020-03-16 2020-07-27
lisi 2020-07-12 41 2020-03-21 2020-07-27 2020-03-16 2020-07-27
lisi 2020-07-27 32 2020-07-12 2020-12-31 2020-03-16 2020-07-27


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM