MySQL中的semi-join


http://www.linuxidc.com/Linux/2015-05/117523.htm

1. 背景介紹
什么是semi-join?
所謂的semi-join是指semi-join子查詢。 當一張表在另一張表找到匹配的記錄之后,半連接(semi-jion)返回第一張表中的記錄。與條件連接相反,即使在右節點中找到幾條匹配的記錄,左節點 的表也只會返回一條記錄。另外,右節點的表一條記錄也不會返回。半連接通常使用IN  或 EXISTS 作為連接條件。 該子查詢具有如下結構:

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

即在where條件的“IN”中的那個子查詢。
這種查詢的特點是我們只關心outer_table中與semi-join相匹配的記錄。
換句話說,最后的結果集是在outer_tables中的,而semi-join的作用只是對outer_tables中的記錄進行篩選。這也是我們進行 semi-join優化的基礎,即我們只需要從semi-join中獲取到最少量的足以對outer_tables記錄進行篩選的信息就足夠了。
所謂的最少量,體現到優化策略上就是如何去重。
以如下語句為例:

select * from Country 
where 
  Country.Code in 
(select City.country 
                   from City 
                   where City.Population>1*1000*1000);

當中的semi-join: “

select City.country 
                   from City 
                   where City.Population>1*1000*1000

” 可能返回的結果集如下: China(Beijin), China(Shanghai), France(Paris)...
我們可以看到這里有2個China,分別來至2條城市記錄Beijin和Shanghai, 但實際上我們只需要1個China就足夠對outer_table 

Country進行篩選了。所以我們需要去重。

2. Mysql支持的Semi-join策略

Mysql支持的semi-join策略主要有5個,它們分別為:
1. DuplicateWeedout: 使用臨時表對semi-join產生的結果集去重。
Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.

 


對應的匹配條件為:

 


 



2. FirstMatch: 只選用內部表的第1條與外表匹配的記錄。
 
FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

 


對應的匹配條件為:


3. LooseScan: 把inner-table數據基於索引進行分組,取每組第一條數據進行匹配。
LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

 


對應的匹配條件為:

 


4. Materializelookup: 將inner-table去重固化成臨時表,遍歷outer-table,然后在固化表上去尋找匹配。
Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
對應的匹配條件:

 




5. MaterializeScan: 將inner-table去重固化成臨時表,遍歷固化表,然后在outer-table上尋找匹配。

 


對應的條件:

 

 
Each of these strategies except Duplicate Weedout can be enabled or disabled using the  optimizer_switch system variable. The  semijoin flag controls whether semi-joins are used. If it is set to  on, the  firstmatchloosescan, and  materialization flags enable finer control over the permitted semi-join strategies. These flags are  on by default.

The use of semi-join strategies is indicated in EXPLAIN output as follows:

  • Semi-joined tables show up in the outer select. EXPLAIN EXTENDED plus SHOW WARNINGS shows the rewritten query, which displays the semi-join structure. From this you can get an idea about which tables were pulled out of the semi-join. If a subquery was converted to a semi-join, you will see that the subquery predicate is gone and its tables and WHERE clause were merged into the outer query join list and WHERE clause.

  • Temporary table use for Duplicate Weedout is indicated by Start temporary and End temporaryin the Extra column. Tables that were not pulled out and are in the range of EXPLAIN output rows covered by Start temporary and End temporary will have their rowid in the temporary table.

  • FirstMatch(tbl_name) in the Extra column indicates join shortcutting.

  • LooseScan(m..n) in the Extra column indicates use of the LooseScan strategy. m and n are key part numbers.

  • As of MySQL 5.6.7, temporary table use for materialization is indicated by rows with a select_type value of MATERIALIZED and rows with a table value of <subqueryN>.

    Before MySQL 5.6.7, temporary table use for materialization is indicated in the Extra column byMaterialize if a single table is used, or by Start materialize and End materialize if multiple tables are used. If Scan is present, no temporary table index is used for table reads. Otherwise, an index lookup is used.

mysql> SELECT @@optimizer_switch\G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,
                    index_merge_sort_union=on,
                    index_merge_intersection=on,
                    engine_condition_pushdown=on,
                    index_condition_pushdown=on,
                    mrr=on,mrr_cost_based=on,
                    block_nested_loop=on,batched_key_access=off,
                    materialization=on,semijoin=on,loosescan=on,
                    firstmatch=on,
                    subquery_materialization_cost_based=on,
                    use_index_extensions=on


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM