Hive中的Predicate Pushdown Rules(謂詞下推規則)


引用:https://blog.csdn.net/strongyoung88/article/details/81156271

 

謂詞下推概念

謂詞下推 Predicate Pushdown(PPD):簡而言之,就是在不影響結果的情況下,盡量將過濾條件提前執行。謂詞下推后,過濾條件在map端執行,減少了map端的輸出,降低了數據在集群上傳輸的量,節約了集群的資源,也提升了任務的性能。

PPD 配置

PPD控制參數:hive.optimize.ppd

  • Default Value: true
  • Added In: Hive 0.4.0

 

Push:謂詞下推,可以理解為被優化
Not Push:謂詞沒有下推,可以理解為沒有被優化

 

實驗

實驗結果列表形式:

Pushed or Not SQL
Pushed select ename,dept_name from E join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Pushed select ename,dept_name from E join D on E.dept_id = D.dept_id where E.eid='HZ001';
Pushed select ename,dept_name from E join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Pushed select ename,dept_name from E join D on E.dept_id = D.dept_id where D.dept_id='D001';
Not Pushed select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Pushed select ename,dept_name from E left outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
Pushed select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Not Pushed select ename,dept_name from E left outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
Pushed select ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Not Pushed select ename,dept_name from E right outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
Not Pushed select ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Pushed select ename,dept_name from E right outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
Not Pushed select ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Not Pushed select ename,dept_name from E full outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
Not Pushed select ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Not Pushed select ename,dept_name from E full outer join D on E.dept_id = D.dept_id where D.dept_id='D001';

實驗結果表格形式:

  Join(inner join) Left Outer Join Right Outer Join Full Outer Join
Left Table Right Table Left Table Right Table Left Table Right Table Left Table Right Table
Join Predicate Pushed Pushed Not Pushed Pushed Pushed Not Pushed Not Pushed Not Pushed
Where Predicate Pushed Pushed Pushed Not Pushed Not Pushed Pushed Not Pushed Not Pushed

此表實際上就是上述PPD規則表

結論

1、對於Join(Inner Join)、Full outer Join,條件寫在on后面,還是where后面,性能上面沒有區別;
2、對於Left outer Join ,右側的表寫在on后面、左側的表寫在where后面,性能上有提高;
3、對於Right outer Join,左側的表寫在on后面、右側的表寫在where后面,性能上有提高;
4、當條件分散在兩個表時,謂詞下推可按上述結論2和3自由組合,情況如下:

SQL 過濾時機
select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001' and D.dept_id = 'D001'); dept_id在map端過濾,eid在reduce端過濾
select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and D.dept_id = 'D001') where E.eid='HZ001'; dept_id,eid都在map端過濾
select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001') where D.dept_id = 'D001'; dept_id,eid都在reduce端過濾
select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id ) where E.eid='HZ001' and D.dept_id = 'D001'; dept_id在reduce端過濾,eid在map端過濾

注意:如果在表達式中含有不確定函數,整個表達式的謂詞將不會被pushed,例如

select a.* 
from a join b on a.id = b.id
where a.ds = '2019-10-09' and a.create_time = unix_timestamp();

因為unix_timestamp是不確定函數,在編譯的時候無法得知,所以,整個表達式不會被pushed,即ds='2019-10-09'也不會被提前過濾。類似的不確定函數還有rand()等。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM