窗口函數
1.相關函數說明
COVER():指定分析函數工作的數據窗口大小,這個數據窗口大小可能會隨着行的變而變化
CURRENT ROW:當前行
n PRECEDING:往前n行數據
n FOLLOWING:往后n行數據
UNBOUNDED:起點,UNBOUNDED PRECEDING 表示從前面的起點, UNBOUNDED FOLLOWING表示到后面的終點
LAG(col,n):往前第n行數據
LEAD(col,n):往后第n行數據
NTILE(n):把有序分區中的行分發到指定數據的組中,各個組有編號,編號從1開始,對於每一行,NTILE返回此行所屬的組的編號。注意:n必須為int類型。
2.數據准備:name,orderdate,cost
jack,2017-01-01,10 tony,2017-01-02,15 jack,2017-02-03,23 tony,2017-01-04,29 jack,2017-01-05,46 jack,2017-04-06,42 tony,2017-01-07,50 jack,2017-01-08,55 mart,2017-04-08,62 mart,2017-04-09,68 neil,2017-05-10,12 mart,2017-04-11,75 neil,2017-06-12,80 mart,2017-04-13,94
3.需求
(1)查詢在2017年4月份購買過的顧客及總人數
(2)查詢顧客的購買明細及月購買總額
(3)上述的場景,要將cost按照日期進行累加
(4)查詢顧客上次的購買時間
(5)查詢前20%時間的訂單信息
4.創建本地business.txt,導入數據
[atguigu@hadoop102 datas]$ vi business.txt
5.創建hive表並導入數據
create table business( name string, orderdate string, cost int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data local inpath "/opt/module/datas/business.txt" into table business; |
6.按需求查詢數據
(1)查詢在2017年4月份購買過的顧客及總人數
select name,count(*) over () from business where substring(orderdate,1,7) = '2017-04' group by name; |
(2)查詢顧客的購買明細及月購買總額
select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) from business; |
(3)上述的場景,要將cost按照日期進行累加
select name,orderdate,cost, sum(cost) over() as sample1,--所有行相加 sum(cost) over(partition by name) as sample2,--按name分組,組內數據相加 sum(cost) over(partition by name order by orderdate) as sample3,--按name分組,組內數據累加 sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一樣,由起點到當前行的聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --當前行和前面一行做聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--當前行和前邊一行及后面一行 sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --當前行及后面所有行 from business; |
(4)查看顧客上次的購買時間
select name,orderdate,cost, lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) as time1, lag(orderdate,2) over (partition by name order by orderdate) as time2 from business; |
(5)查詢前20%時間的訂單信息
select * from ( select name,orderdate,cost, ntile(5) over(order by orderdate) sorted from business ) t where sorted = 1; |