hive—盡量少用表連接JOIN,多用UNION ALL+group by +計算函數


hive表連接沒有SQL強,所以hive多構造大寬表,而不是,多個小表之間的表連接。
hive表連接  join可以用,但是,效率低。
下面,舉一個可以用UNION ALL+group by +計算函數,代替表連接的例子。

- 需求:2019年每個用戶的支付和退款金額匯總

--union all
select a.user_name,
sum(a.total_amount),
sum(a.refund_amount)
from
(select user_name,
sum(pay_amount) total_amount,
0 as refund_amount
from user_trade
where year(dt)=2019
group by user_name
union all
select user_name,
0 as total_amount,
sum(refund_amount) refund_amount
from user_refund
where year(dt)=2019
group by user_name)a
group by a.user_name;

 

-- full join(表連接也可以,但是效率低)

select coalesce(a.user_name,b.user_name), 
if(a.total_amount is null, 0,a.total_amount),
if(b.refund_amount is null,0,b.refund_amount)
from
(select user_name,
sum(pay_amount) total_amount,
0 as refund_amount
from user_trade
where year(dt)=2019
group by user_name)a
full join
(select user_name,
0 as total_amount,
sum(refund_amount) refund_amount
from user_refund
where year(dt)=2019
group by user_name)b
on a.user_name=b.user_name;

 

PS:解釋一下coalesce()函數

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM