簡單聊聊TiDB中sql優化的一個規則---左連接消除(Left Out Join Elimination)

本文轉載自查看原文 2019-11-24 17:03 557

我們看看 TiDB 一段代碼的實現 --- 左外連接(Left Out Join)的消除;

select 的優化一般是這樣的過程:

在邏輯執行計划的優化階段, 會有很多關系代數的規則, 需要將邏輯執行計划(LogicalPlan)樹應用到各個規則中, 嘗試進行優化改寫;

我們看看其中的一條優化規則: outerJoinEliminator

TiDB作為優秀的開源項目, 代碼的注釋也非常優秀, 里面提到了滿足這些條件的 Left Outer Join 可以消除右表;

// tryToEliminateOuterJoin will eliminate outer join plan base on the following rules
// 1. outer join elimination: For example left outer join, if the parent only use the
//    columns from left table and the join key of right table(the inner table) is a unique
//    key of the right table. the left outer join can be eliminated.
// 2. outer join elimination with duplicate agnostic aggregate functions: For example left outer join.
//    If the parent only use the columns from left table with 'distinct' label. The left outer join can
//    be eliminated.

我們這里只討論第一種情況, 第二種情況請您自行查看源碼;

我們構造滿足第一種情況的查詢:

左表:

　　t1(
　　　　id int primary key not null auto_increment,
　　　　a int,
　　　　b int
　　);

右表:

　　t2(
　　　　id int primary key not null auto_increment,
　　　　a int,
　　　　b int
　　);

查詢語句:

　　select t1.id, t1.a from t1 left join t2 on t1.id = t2.id;

我們看看優化規則之前的邏輯執行計划:

這個執行計划是這樣的:

　　頂層的算子是投影(Projection)操作, 取 t1.id 和 t1.a 這兩列;

　　接下來是連接(Join) 操作, 類型是: LeftOuterJoin;

　　接下來左邊是 OuterPlan, 左表; 右邊是 InnerPlan, 右表;

　　左邊的算子是掃 t1 的數據, 右邊的算子是掃 t2 表的數據;

　　底層的算子將數據返回給上層的算子, 來完成計划的執行;

　　　　注, 這種數據自底向上的流動方式有點像火山噴發, 所以這種執行模型叫做火山模型(Volcano);

主要代碼邏輯在這里:

　　outerJoinEliminator::doOptimize

　　　　這是一個遞歸的操作, 不斷的獲取 parentCols, 並對 LeftOuterJoin 或者 RightOuterJoin 嘗試進行消除;

　　　　如果是LeftOuterJoin , 嘗試消除右表, 如果是RightOuterJoin, 嘗試消除左表;

因為我們這里只有 Projection算子和 LeftOuterJoin算子, 所以代碼調用邏輯基本是這樣的:

　　* 獲取Projection的列

　　* 對下面的LeftOuterJoin進行判斷

　　　　* 獲取左表的列: outerPlan.Schema().Columns

　　　　* 判斷上層 Projection 用到的列是否全部來自左表: o.isColsAllFromOuterTable(parentCols, outerUniqueIDs)

　　　　* 獲取 Join 連接的列: innerJoinKeys := o.extractInnerJoinKeys(p, innerChildIdx); 這即是右表的 t2.id

　　　　* 判斷連接的列是否被包含在右表的主鍵: o.isInnerJoinKeysContainUniqueKey(innerPlan, innerJoinKeys)

　　　　* 滿足條件, 將 LeftOutJoin 替換掉;

我們展示一下這個轉換:

上圖中灰色的執行計划會被消除掉;

變成了下面的執行計划:

最終, 上面給出的sql 的例子等價於下面的語句:

　　select t1.id, t1.a from t1;

有興趣的讀者可以看看其他的滿足條件的左外連接消除的邏輯, 這里就不講了;

邏輯優化的過程一般被叫做RBO(rule based optimization);

　　邏輯規則的優化是基於關系代數的等價推導和證明;

　　大部分數據庫(例如mysql, Oracle, SQLServer)的邏輯優化規則都類似, 可以互相參考;

物理優化的過程一般被叫做CBO(cost based optimization);

　　不同的數據庫的物理優化規則不一定是一樣的, 這個可能根據數據和索引的存放特點來進行針對性的處理;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Linq 左連接 left join sql Left join 左連接列名帶表名問題 EF Linq中的左連接Left Join查詢 SQL優化——union與union all 、left join 和 inner join 及內連接數據庫中的左連接(left join)和右連接(right join)區別左查詢left join on簡單總結左連接LEFT JOIN 連接自己時的查詢結果測試數據庫左連接 a left join b on MySQL Left Join(左連接) 耗時嚴重的問題 SQL left join 左表合並去重技巧總結