MySQL 通過semi join 優化子查詢


半連接是MySQL 5.6.5引入的,多在子查詢exists中使用,對外部row source的每個鍵值,查找到內部row source匹配的第一個鍵值后就返回,如果找到就不用再查找內部row source其他的鍵值了。

測試環境

mysql> desc class;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| class_num  | int(11)     | NO   | PRI | NULL    |       |
| class_name | varchar(20) | YES  |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> desc roster;
+-------------+---------+------+-----+---------+-------+
| Field       | Type    | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| class_num   | int(11) | YES  |     | NULL    |       |
| student_num | int(11) | YES  |     | NULL    |       |
+-------------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

roster表中記錄的是學生的學號以及對應的教室,多個學生可能在同一個教室,所以字段class_num有重復值

class表中記錄的是教室及對應的班級名,字段class_num為唯一值

如果要查詢存在學生的班級有哪些

mysql>  SELECT class.class_num, class.class_name FROM class INNER JOIN roster WHERE class.class_num = roster.class_num;       
+-----------+------------+
| class_num | class_name |
+-----------+------------+
|         2 | class 2    |
|         3 | class 3    |
|         3 | class 3    |
+-----------+------------+
3 rows in set (0.00 sec)

可以通過distinct去除重復值,但這樣做影響性能,所以通過子查詢來得出結果

mysql>  SELECT class_num, class_name FROM class WHERE class_num IN (SELECT class_num FROM roster);       
+-----------+------------+
| class_num | class_name |
+-----------+------------+
|         2 | class 2    |
|         3 | class 3    |
+-----------+------------+
2 rows in set (0.00 sec)

優化器實際上是將子查詢改寫為了半連接

mysql> explain SELECT class_num, class_name FROM class WHERE class_num IN (SELECT class_num FROM roster);
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                             |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
|  1 | SIMPLE      | roster | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | Start temporary                                                   |
|  1 | SIMPLE      | class  | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL |    4 |    25.00 | Using where; End temporary; Using join buffer (Block Nested Loop) |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                      |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`class`.`class_num` AS `class_num`,`test`.`class`.`class_name` AS `class_name` from `test`.`class` semi join (`test`.`roster`) where (`test`.`class`.`class_num` = `test`.`roster`.`class_num`) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Start temporary 和 End temporary的使用表明使用了臨時表來去除重復值
如果 select_type 的值為 MATERIALIZED 並且 字段 rows的輸出是 <subqueryN> 則表明臨時表用於了物化表


select_type value of MATERIALIZED and rows with a table value of <subqueryN>.

 

如果子查詢符合准則(參考文獻:http://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html#semi-joins),MySQL將其轉化為semi-join並從以下策略中作出基於cost的選擇

  • Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.

  • Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.

  • FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

  • LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

  • Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned

     不確定的內容不敢隨意翻譯,摘出來原汁原味的文獻內容

系統變量optimizer_switch中的semi join 標簽控制着半連接是否可用,5.6默認是開啟的

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM