http://www.linuxidc.com/Linux/2015-05/117523.htm
1. 背景介紹
什么是semi-join?
所謂的semi-join是指semi-join子查詢。 當一張表在另一張表找到匹配的記錄之后,半連接(semi-jion)返回第一張表中的記錄。與條件連接相反,即使在右節點中找到幾條匹配的記錄,左節點 的表也只會返回一條記錄。另外,右節點的表一條記錄也不會返回。半連接通常使用IN 或 EXISTS 作為連接條件。 該子查詢具有如下結構:
SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...
即在where條件的“IN”中的那個子查詢。
這種查詢的特點是我們只關心outer_table中與semi-join相匹配的記錄。
換句話說,最后的結果集是在outer_tables中的,而semi-join的作用只是對outer_tables中的記錄進行篩選。這也是我們進行 semi-join優化的基礎,即我們只需要從semi-join中獲取到最少量的足以對outer_tables記錄進行篩選的信息就足夠了。
所謂的最少量,體現到優化策略上就是如何去重。
以如下語句為例:
select * from Country where Country.Code in (select City.country from City where City.Population>1*1000*1000);
當中的semi-join: “
select City.country from City where City.Population>1*1000*1000
” 可能返回的結果集如下: China(Beijin), China(Shanghai), France(Paris)...
我們可以看到這里有2個China,分別來至2條城市記錄Beijin和Shanghai, 但實際上我們只需要1個China就足夠對outer_table
2. Mysql支持的Semi-join策略
Mysql支持的semi-join策略主要有5個,它們分別為:
1. DuplicateWeedout: 使用臨時表對semi-join產生的結果集去重。

對應的匹配條件為:

2. FirstMatch: 只選用內部表的第1條與外表匹配的記錄。

對應的匹配條件為:

3. LooseScan: 把inner-table數據基於索引進行分組,取每組第一條數據進行匹配。

對應的匹配條件為:

4. Materializelookup: 將inner-table去重固化成臨時表,遍歷outer-table,然后在固化表上去尋找匹配。
對應的匹配條件:

5. MaterializeScan: 將inner-table去重固化成臨時表,遍歷固化表,然后在outer-table上尋找匹配。

對應的條件:

optimizer_switch
system variable. The
semijoin
flag controls whether semi-joins are used. If it is set to
on
, the
firstmatch
,
loosescan
, and
materialization
flags enable finer control over the permitted semi-join strategies. These flags are
on
by default.
The use of semi-join strategies is indicated in EXPLAIN
output as follows:
-
Semi-joined tables show up in the outer select.
EXPLAIN EXTENDED
plusSHOW WARNINGS
shows the rewritten query, which displays the semi-join structure. From this you can get an idea about which tables were pulled out of the semi-join. If a subquery was converted to a semi-join, you will see that the subquery predicate is gone and its tables andWHERE
clause were merged into the outer query join list andWHERE
clause. -
Temporary table use for Duplicate Weedout is indicated by
Start temporary
andEnd temporary
in theExtra
column. Tables that were not pulled out and are in the range ofEXPLAIN
output rows covered byStart temporary
andEnd temporary
will have theirrowid
in the temporary table. -
FirstMatch(
in thetbl_name
)Extra
column indicates join shortcutting. -
LooseScan(
in them
..n
)Extra
column indicates use of the LooseScan strategy.m
andn
are key part numbers.
-
As of MySQL 5.6.7, temporary table use for materialization is indicated by rows with a
select_type
value ofMATERIALIZED
and rows with atable
value of<subquery
.N
>Before MySQL 5.6.7, temporary table use for materialization is indicated in the
Extra
column byMaterialize
if a single table is used, or byStart materialize
andEnd materialize
if multiple tables are used. IfScan
is present, no temporary table index is used for table reads. Otherwise, an index lookup is used.
mysql> SELECT @@optimizer_switch\G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,
index_merge_sort_union=on,
index_merge_intersection=on,
engine_condition_pushdown=on,
index_condition_pushdown=on,
mrr=on,mrr_cost_based=on,
block_nested_loop=on,batched_key_access=off,
materialization=on,semijoin=on,loosescan=on,
firstmatch=on,
subquery_materialization_cost_based=on,
use_index_extensions=on