Join導致冗余數據引起慢SQL

本文轉載自查看原文 2017-09-01 22:06 1070 postgresql

業務過程中碰到多個join引起慢SQL問題，數據量不大，但查詢很慢，搜到一片BLog，參考解決。

業務過程不記錄，以blog內容重現：

原SQL：

select    
distinct abc.pro_col1, abc.col3    
from    
t0 p    
INNER JOIN t1 abc   
  on p.id=abc.par_col2  
inner join t2 s   
  on  s.col3=abc.col3    
inner join t3 po   
  on  po.id=s.col4   
where p.state=2 and po.state=3   
order by abc.pro_col1, abc.col3;

以上SQL同：

select select    
distinct abc.pro_col1, abc.col3    
from t0 p, t1 abc, t2 s, t3 po 
where p.id=abc.par_col2 
and s.col3=abc.col3 
and po.id=s.col4
and p.state=2 and po.state=3 
order by abc.pro_col1, abc.col3;

分析優化：

從語義來看，這條SQL是在經過幾個JOIN后取其中一個表的兩個字段的唯一值。

但是每一次關聯，都可能產生冗余的值，所以導致了結果集越來越龐大。

修改建議，每一次JOIN都輸出唯一值，減少冗余。即多次JOIN導致查詢結果集越來越大（笛卡兒積），可以把過濾條件放在前面。

select   
distinct pro_col1, col3 from  
(  
    select   
    distinct t1.pro_col1, t1.col3, s.col4 from   
    (  
        select   
        distinct abc.pro_col1, abc.col3 from   
        t1 abc INNER JOIN t0 p      
        on (p.id = abc.par_col2 and p.state=2)  
        ) t1  
    inner join t2 s   
    on (s.col3 = t1.col3)  
) t2  
inner join t3 po     
on (po.id = t2.col4 and po.state=3)  
order by t2.pro_col1, t2.col3  ;

以下實例：

postgres=# create table rt1(id int, info text);  
CREATE TABLE  
postgres=# create table rt2(id int, info text);  
CREATE TABLE  
postgres=# create table rt3(id int, info text);  
CREATE TABLE  
postgres=# create table rt4(id int, info text);  
CREATE TABLE  
  
postgres=# insert into rt1 select generate_series(1,1000),'test';  
INSERT 0 1000  
postgres=# insert into rt2 select 1,'test' from generate_series(1,1000);  
INSERT 0 1000  
postgres=# insert into rt3 select 1,'test' from generate_series(1,1000);  
INSERT 0 1000  
postgres=# insert into rt4 select 1,'test' from generate_series(1,1000);  
INSERT 0 1000

對比：

優化后查詢：

從執行時間可以看到，優化后的速度何止是快。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 刪除表中的冗余數據 Flink SQL 如何實現數據流的 Join？什么是數據冗余 oracle 日常刪除多余數據 SQL 算法：淺談數據庫Join的實現原理大數據SQL中的Join謂詞下推，真的那么難懂？ SQL INNER JOIN查詢來自兩個或多個表的數據遇到SQL查詢慢問題（從千萬級數據查詢） Mysql數據庫的慢sql優化步驟數據庫死鎖和慢日志問題導致服務不可用的排查過程