MySQL-join的實現原理、優化及NLJ算法

本文轉載自查看原文 2017-07-11 15:45 5267

案例分析：

select
    c.*
from
    hotel_info_original c 
left join
    hotel_info_collection h 
on
    c.hotel_type=h.hotel_type 
and
    c.hotel_id =h.hotel_id 
where
    h.hotel_id is null

　　這個sql是用來查詢出 c 表中有 h 表中無的記錄，所以想到了用 left join 的特性（返回左邊全部記錄，右表不滿足匹配條件的記錄對應行返回 null）來滿足需求，不料這個查詢非常慢。先來看查詢計划：

　　rows代表這個步驟相對上一步結果的每一行需要掃描的行數，可以看到這個sql需要掃描的行數為35773*8134，非常大的一個數字。

　　在EXPLAIN結果中，第一行出現的表就是驅動表。

NLJ 算法

　　即 Nested Loop Join，就是掃描一個表（外表，也叫驅動表），每讀到一條記錄，就根據 join 字段上的索引去另一張表（內表）里查找。內表（一般是帶索引的表）被外表（也叫驅動表，一般為小表，不僅相對其他表為小表，而且記錄數的絕對值也小，不要求有索引）驅動，外表返回的每一行都要在內表中檢索與其匹配的行，因此整個返回的結果集不能太大（大於 1 萬不適合）。

　　驅動表：就是在嵌套循環連接和哈希連接中，用來最先獲得數據，並以此表的數據為依據，逐步獲得其他表的數據，直至最終查詢到所有滿足條件的數據的第一個表。驅動表不一定是表，有可能是數據集，即由某個表中滿足條件的數據行，組成子集合后，再以此子集合作為連接其他表的數據來源。這個子集合，才是真正的驅動表，有時候為了簡潔，直接將最先按照條件或得子集合的那張表叫做驅動表。我們常說，驅動表一定是小表，指的是根據條件獲得的子集合一定要小，而不是說實體表本身一定要小，大表如果獲得的子集合小，一樣可以簡稱這個大表為驅動表。

　　如果有三個及以上的表，則會先使用 NLJ 算法得到一、二個表的結果集，並將該結果集作為外層數據，遍歷結果集到后第三個表中查詢數據。

一個簡單的嵌套循環聯接（NLJ）算法，循環從第一個表中依次讀取行，取到每行再到聯接的下一個表中循環匹配。這個過程會重復多次直到剩余的表都被聯接了。假設表t1、t2、t3用下面的聯接類型進行聯接：

Table Join Type
t1 range
t2 ref
t3 ALL

　　如果使用的是簡單NLJ算法，那么聯接的過程像這樣：

for each row in t1 matching range {
    for each row in t2 matching reference key {
        for each row in t3 {
            if row satisfies join conditions,
                send to client
        }
    }
}

　　因為NLJ算法是通過外循環的行去匹配內循環的行，所以內循環的表會被掃描多次。

　　由此可知道，on a.id = b.aid 代表着驅動表無法使用此索引，是給被驅動表用的。

BLJ 算法

　　即 Block Nested-Loop Join，是MySQL 自己創建的方式。將指定的外層鍵對應的被驅動表緩存起來以提高性能。

　　Join操作使用內存(join_buffer_size)：應用程序經常會出現一些兩表（或多表）Join的操作需求，MySQL在完成某些 Join 需求的時候（all/index join），為了減少參與Join的“被驅動表”的讀取次數以提高性能，需要使用到 Join Buffer 來協助完成 Join操作（具體 Join 實現算法請參考：MySQL中的 Join基本實現原理）。當 Join Buffer太小，MySQL不會將該 Buffer存入磁盤文件，而是先將Join Buffer中的結果集與需要 Join 的表進行 Join操作，然后清空 Join Buffer中的數據，繼續將剩余的結果集寫入此 Buffer中，如此往復。這勢必會造成被驅動表需要被多次讀取，成倍增加 IO訪問，降低效率。

for each row in t1 matching range {
    for each row in t2 matching reference key {
        store used columns from t1, t2 in join buffer
        if buffer is full {
            for each row in t3 {
                for each t1, t2 combination in join buffer {
                    if row satisfies join conditions,
                        send to client
                }
            }
            empty buffer
        }
    }
}

if buffer is not empty {
    for each row in t3 {
        for each t1, t2 combination in join buffer {
            if row satisfies join conditions,
                send to client
        }
    }
}

　　對上面的過程解釋如下：
　　　　1. 將t1、t2的聯接結果放到緩沖區，直到緩沖區滿為止；
　　　　2. 遍歷t3，內部再循環緩沖區，並找到匹配的行，發送到客戶端；
　　　　3. 清空緩沖區；
　　　　4. 重復上面步驟，直至緩沖區不滿；
　　　　5. 處理緩沖區中剩余的數據，重復步驟2。

　　設S是每次存儲t1、t2組合的大小，C是組合的數量，則t3被掃描的次數為：

　　　　(S * C)/join_buffer_size + 1
　　由此可見，隨着join_buffer_size的增大，t3被掃描的次數會較少，如果join_buffer_size足夠大，大到可以容納所有t1和t2聯接產生的數據，t3只會被掃描1次。

實例1:

mysql> show create table c;  
+-------+---------------------------------------------------------------------------------------------------------------------+  
| Table | Create Table                                                                                                        |  
+-------+---------------------------------------------------------------------------------------------------------------------+  
| c     | CREATE TABLE `c` (  
  `id` int(11) NOT NULL,  
  `name` varchar(100) DEFAULT NULL  
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |  
+-------+---------------------------------------------------------------------------------------------------------------------+  
1 row in set (0.00 sec)  
  
mysql> show create table d;  
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+  
| Table | Create Table                                                                                                                                    |  
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+  
| d     | CREATE TABLE `d` (  
  `id` int(11) NOT NULL,  
  `score` int(11) DEFAULT NULL,  
  `stuid` int(11) DEFAULT NULL  
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |  
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+  
1 row in set (0.00 sec)  
  
mysql> explain select c.id,d.score from c,d where c.id=d.stuid;  
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                          |  
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  
|  1 | SIMPLE      | c     | ALL  | NULL          | NULL | NULL    | NULL |   42 |                                |  
|  1 | SIMPLE      | d     | ALL  | NULL          | NULL | NULL    | NULL |   61 | Using where; Using join buffer |  
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  
2 rows in set (0.00 sec)

　　MySQL 會根據條件選用不同的執行策略。比如說在上面的 d 和 c 表中，如果按照當前的 c 和 d 的結構，執行 explain 之后，是 c 驅動 d 表，因為 c 表較小。

　　那么如果在c的id上加一個index之后，mysql就會采用d驅動c表了。

　　【因為此時，在Nested Loop Join算法中，內部循環可以使用c表上的索引，加速執行c表的查詢。內部查詢每加快一點，對整個join來說都是效率上比較大的提升】

mysql> alter table c add index(id);  
Query OK, 0 rows affected (0.94 sec)  
Records: 0  Duplicates: 0  Warnings: 0  
  
mysql> explain select c.id,d.score from c,d where c.id=d.stuid;  
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+  
| id | select_type | table | type | possible_keys | key  | key_len | ref          | rows | Extra       |  
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+  
|  1 | SIMPLE      | d     | ALL  | NULL          | NULL | NULL    | NULL         |   61 |             |  
|  1 | SIMPLE      | c     | ref  | id            | id   | 4       | test.d.stuid |    1 | Using index |  
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------------+  
2 rows in set (0.00 sec)

實例2：

　　表結構：

create table `user_group` (
    `user_id` int(11) NOT NULL,
    `group_id` int(11) not null,
    `user_type` int(11) not null,
    `gmt_create` datetime not null,
    `gmt_modified` datetime not null,
    `status` varchar(16) not null,
    key `idx_user_group_uid` (`user_id`)
) engine=innodb default charset=utf8;

create table `group_message` (
    `id` int(11) not null auto_increment,
    `gmt_create` datetime not null,
    `gmt_modified` datetime not null,
    `group_id` int(11) not null,
    `user_id` int(11) not null,
    `author` varchar(32) not null,
    `subject` varchar(128) not null,
    primary key (`id`),
    key `idx_group_message_author_subject` (`author`,`subject`(16)),
    key `idx_group_message_author` (`author`),
    key `idx_group_message_gid_uid` (`group_id`,`user_id`)
) engine=innodb auto_increment=97 default charset=utf8;

create table `group_message_content` (
    `group_msg_id` int(11) not null,
    `content` text NOT NULL,
    KEY `group_message_content_msg_id` (`group_msg_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

　　查詢：

explain
    select
        m.subject msg_subject,
        c.content msg_content
    from
        user_group g,
        group_message m,
        group_message_content c
    where
        g.user_id = 1
    and
        m.group_id = g.group_id
    and
        c.group_msg_id = m.id\G

　　結果：

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: g
type: ref
possible_keys: user_group_gid_ind,user_group_uid_ind,user_group_gid_uid_ind
key: user_group_uid_ind
key_len: 4
ref: const
rows: 2
Extra:

*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ref
possible_keys: PRIMARY,idx_group_message_gid_uid
key: idx_group_message_gid_uid
key_len: 4
ref: example.g.group_id
rows: 3
Extra:

*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ref
possible_keys: idx_group_message_content_msg_id
key: idx_group_message_content_msg_id
key_len: 4
ref: example.m.id
rows: 2
Extra:

　　MySQL Query Optimizer 選擇了 user_group 作為驅動表，首先利用我們傳入的條件 user_id 通過該表上面的索引 user_group_uid_ind 來進行 const 條件的索引 ref 查找，然后以 user_group 表中過濾出來的結果集的 group_id 字段作為查詢條件，對 group_message 循環查詢，然后再通過 user_group 和 group_message 兩個表的結果集中的 group_message 的 id 作為條件與 group_message_content 的 group_msg_id 比較進行循環查詢，才得到最終的結果。沒啥特別的，后一個引用前一個的結果集作為條件。

　　如果去掉 group_message_content 上面的 idx_group_message_content_msg_id 這個索引，然后再看看會是什么效果：

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: g
type: ref
possible_keys: idx_user_group_uid
key: idx_user_group_uid
key_len: 4
ref: const
rows: 2
Extra:

*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ref
possible_keys: PRIMARY,idx_group_message_gid_uid
key: idx_group_message_gid_uid
key_len: 4
ref: example.g.group_id
rows: 3
Extra:

*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 96
Extra: Using where; Using join buffer

　　我們看到不僅僅 group_message_content 表的訪問從 ref 變成了 ALL，此外，在最后一行的 Extra信息從沒有任何內容變成為 Using where; Using join buffer，也就是說，對於從 ref 變成 ALL 很容易理解，沒有可以使用的索引的索引了嘛，當然得進行全表掃描了，Using where 也是因為變成全表掃描之后，我們需要取得的 content 字段只能通過對表中的數據進行 where 過濾才能取得，但是后面出現的 Using join buffer 是一個啥呢?

　　我們知道，MySQL 中有一個供我們設置的參數 join_buffer_size ，這里實際上就是使用到了通過該參數所設置的 Buffer 區域。那為啥之前的執行計划中沒有用到呢?

　　實際上，Join Buffer 只有當我們的 Join 類型為 ALL(如示例中)，index，rang 或者是 index_merge 的時候才能夠使用，所以，在我們去掉 group_message_content 表的 group_msg_id 字段的索引之前，由於 Join 是 ref 類型的，所以我們的執行計划中並沒有看到有使用 Join Buffer。

join 優化：

　　用小結果集驅動大結果集，盡量減少join語句中的Nested Loop的循環總次數；
　　優先優化Nested Loop的內層循環，因為內層循環是循環中執行次數最多的，每次循環提升很小的性能都能在整個循環中提升很大的性能；
　　對被驅動表的join字段上建立索引；
　　當被驅動表的join字段上無法建立索引的時候，設置足夠的Join Buffer Size

參考：

　　http://www.jasongj.com/2015/03/07/Join1/ #強烈推薦讀，本文沒寫全里面的 Hash Join、Merge Join 等分析

　　http://www.cnblogs.com/weizhenlu/p/5970392.html

　　http://blog.csdn.net/ghsau/article/details/43762027

　　http://database.51cto.com/art/200904/117947.htm

　　http://blog.csdn.net/ys_565137671/article/details/6361730

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Mysql join 算法原理 MySql優化- join匹配原理(一) Join的實現原理及優化思路 Mysql-Join 關聯查詢之索引失效問題 mysql 查詢優化~join算法 MySQL的JOIN（二）：JOIN原理 Mysql優化之join優化 MySQL聯接查詢算法（NLJ、BNL、BKA、HashJoin） MySQL聯接查詢算法（NLJ、BNL、BKA、HashJoin） mysql left join 優化