hive row_number等窗口分析函數


 

一、排序&去重分析

row_number() over(partititon by col1 order by col2) as rn

結果:1,2,3,4

rank() over(partititon by col1 order by col2) as rk

結果:1,2,2,4,5

dense_rank() over(partititon by col1 order by col2) as ds_rk

結果:1,2,2,3,4

select 
        order_id,
        departure_date,
        row_number() over(partition by order_id order by departure_date) as rn,  -- 直排
        rank() over(partition by order_id order by departure_date) as rk,        -- 並列的,下一個數字會跳過
        dense_rank() over(partition by order_id order by departure_date) as d_rk -- 並列的,下一個數據不會跳過
  from ord_test 
 where order_id=410341346
;

運行結果:

 

二、跨行獲取  

lag(col1,n,DEFAULT) over(partition by col1 order by col2) as up
用於統計窗口內往上第n行值,第一個參數為列名,第二個參數為往上第n行(可選,默認為1),第三個參數為默認值(當往上第n行為NULL時候,取默認值,如不指定,則為NULL)

lead(col1,n,DEFAULT) over(partition by col1 order by col2) as down
用於統計窗口內往下第n行值,第一個參數為列名,第二個參數為往下第n行(可選,默認為1),第三個參數為默認值(當往下第n行為NULL時候,取默認值,如不指定,則為NULL)

first_value() over(partition by col1 order by col2) as fv
取分組內排序后,截止到當前行,第一個值

last_value() over(partition by col1 order by col2) as lv
取分組內排序后,截止到當前行,第一個值

select 
       order_id,
       departure_date,
       first_value(departure_date) over(partition by order_id order by add_time)as fv,  -- 取分組內第一條
       last_value(departure_date) over(partition by order_id order by add_time)as lv    -- 取分組內最后一條  
  from ord_test
 where order_id=410341346
;

select 
       order_id,
       departure_date,
       lead(departure_date,1) over(partition by order_id order by departure_date)as down_1, -- 向下取一級
       lag(departure_date,1) over(partition by order_id order by departure_date)as up_1     -- 向上取一級
  from ord_test
 where order_id=410341346
;

結果截圖:

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM