hive之窗口函數


窗口函數

1.相關函數說明

COVER():指定分析函數工作的數據窗口大小,這個數據窗口大小可能會隨着行的變而變化

CURRENT ROW:當前行

n PRECEDING:往前n行數據

n FOLLOWING:往后n行數據

UNBOUNDED:起點,UNBOUNDED PRECEDING 表示從前面的起點, UNBOUNDED FOLLOWING表示到后面的終點

LAG(col,n):往前第n行數據

LEAD(col,n):往后第n行數據

 

NTILE(n):把有序分區中的行分發到指定數據的組中,各個組有編號,編號從1開始,對於每一行,NTILE返回此行所屬的組的編號。注意:n必須為int類型。

2.數據准備:name,orderdate,cost

jack,2017-01-01,10

tony,2017-01-02,15

jack,2017-02-03,23

tony,2017-01-04,29

jack,2017-01-05,46

jack,2017-04-06,42

tony,2017-01-07,50

jack,2017-01-08,55

mart,2017-04-08,62

mart,2017-04-09,68

neil,2017-05-10,12

mart,2017-04-11,75

neil,2017-06-12,80

mart,2017-04-13,94

 

3.需求

(1)查詢在20174月份購買過的顧客及總人數

(2)查詢顧客的購買明細及月購買總額

(3)上述的場景,要將cost按照日期進行累加

(4)查詢顧客上次的購買時間

(5)查詢前20%時間的訂單信息

4.創建本地business.txt,導入數據

[atguigu@hadoop102 datas]$ vi business.txt

5.創建hive表並導入數據

create table business(

name string,

orderdate string,

cost int

) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

 

load data local inpath "/opt/module/datas/business.txt" into table business;

6.按需求查詢數據

(1)查詢在20174月份購買過的顧客及總人數

select name,count(*) over ()

from business 

where substring(orderdate,1,7) = '2017-04'

group by name;

(2)查詢顧客的購買明細及月購買總額

select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) from

 business;

(3)上述的場景,要將cost按照日期進行累加

select name,orderdate,cost,

sum(cost) over() as sample1,--所有行相加

sum(cost) over(partition by name) as sample2,--name分組,組內數據相加

sum(cost) over(partition by name order by orderdate) as sample3,--name分組,組內數據累加

sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--sample3一樣,由起點到當前行的聚合

sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --當前行和前面一行做聚合

sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--當前行和前邊一行及后面一行

sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --當前行及后面所有行

from business;

(4)查看顧客上次的購買時間

select name,orderdate,cost,

lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) as time1, lag(orderdate,2) over (partition by name order by orderdate) as time2

from business;

(5)查詢前20%時間的訂單信息

select * from (

    select name,orderdate,cost, ntile(5) over(order by orderdate) sorted

    from business

) t

where sorted = 1;


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM