Using join buffer (Block Nested Loop)
msyql的表連接算法
走索引之后
mysql> explain SELECT a.custid, b.score, b.xcreditscore, b.lrscore FROM( SELECT DISTINCT custid FROM sync.`credit_apply` WHERE SUBSTR(createtime, 1, 10) >= '2019-12-15' AND rejectrule = 'xxx') a JOIN (select * from sync.`credit_creditchannel`) b ON a.custid = b.custid; +----+-------------+----------------------+------------+-------+-----------------------------+-----------------------------+---------+----------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------------------+------------+-------+-----------------------------+-----------------------------+---------+----------+---------+----------+-------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 167831 | 100.00 | NULL | | 1 | PRIMARY | credit_creditchannel | NULL | ref | credit_creditchannel_custId | credit_creditchannel_custId | 43 | a.custid | 1 | 100.00 | Using where | | 2 | DERIVED | credit_apply | NULL | index | index2 | index2 | 518 | NULL | 1678311 | 10.00 | Using where | +----+-------------+----------------------+------------+-------+-----------------------------+-----------------------------+---------+----------+---------+----------+-------------+ 3 rows in set (0.08 sec) mysql> explain SELECT a.custid, b.score, b.xcreditscore, b.lrscore FROM( SELECT DISTINCT custid FROM sync.`credit_apply` WHERE SUBSTR(createtime, 1, 10) >= '2019-12-15' AND rejectrule = 'xxx') a LEFT JOIN (select * from sync.`credit_creditchannel`) b ON a.custid = b.custid; +----+-------------+----------------------+------------+-------+-----------------------------+-----------------------------+---------+----------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------------------+------------+-------+-----------------------------+-----------------------------+---------+----------+---------+----------+-------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 167831 | 100.00 | NULL | | 1 | PRIMARY | credit_creditchannel | NULL | ref | credit_creditchannel_custId | credit_creditchannel_custId | 43 | a.custid | 1 | 100.00 | Using where | | 2 | DERIVED | credit_apply | NULL | index | index2 | index2 | 518 | NULL | 1678311 | 10.00 | Using where | +----+-------------+----------------------+------------+-------+-----------------------------+-----------------------------+---------+----------+---------+----------+-------------+ 3 rows in set (0.07 sec)
走索引之前
mysql> explain SELECT a.custid, b.score, b.xcreditscore, b.lrscore FROM( SELECT DISTINCT custid FROM sync.`credit_apply` WHERE SUBSTR(createtime, 1, 10) >= '2019-12-15' AND rejectrule = 'xxx') a LEFT JOIN (select * from sync.`credit_creditchannel`) b ON a.custid = b.custid; +----+-------------+----------------------+------------+-------+---------------+--------+---------+------+---------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------------------+------------+-------+---------------+--------+---------+------+---------+----------+----------------------------------------------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 158107 | 100.00 | NULL | | 1 | PRIMARY | credit_creditchannel | NULL | ALL | NULL | NULL | NULL | NULL | 450770 | 100.00 | Using where; Using join buffer (Block Nested Loop) | | 2 | DERIVED | credit_apply | NULL | index | index2 | index2 | 518 | NULL | 1581075 | 10.00 | Using where | +----+-------------+----------------------+------------+-------+---------------+--------+---------+------+---------+----------+----------------------------------------------------+ rows in set (0.06 sec)
Nested Loop Join(NLJ)算法
NLJ 算法:將驅動表/外部表的結果集作為循環基礎數據,然后循環從該結果集每次一條獲取數據作為下一個表的過濾條件查詢數據,然后合並結果。
如果有多表join,則將前面的表的結果集作為循環數據,取到每行再到聯接的下一個表中循環匹配,獲取結果集返回給客戶端。
Nested-Loop 的偽算法如下:
for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } } }
Because the NLJ algorithm passes rows one at a time from outer loops to inner loops, tables processed in the inner loops typically are read many times
因為普通Nested-Loop一次只將一行傳入內層循環, 所以外層循環(的結果集)有多少行, 內存循環便要執行多少次.在內部表的連接上有索引的情況下,其掃描成本為O(Rn),若沒有索引,則掃描成本為O(Rn*Sn)。如果內部表S有很多記錄,則SimpleNested-Loops Join會掃描內部表很多次,執行效率非常差。
Block Nested-Loop Join(BNL)算法
BNL 算法:將外層循環的行/結果集存入join buffer, 內層循環的每一行與整個buffer中的記錄做比較,從而減少內層循環的次數。
舉例來說,外層循環的結果集是100行,使用NLJ 算法需要掃描內部表100次,如果使用BNL算法,先把對Outer Loop表(外部表)每次讀取的10行記錄放到join buffer,然后在InnerLoop表(內部表)中直接匹配這10行數據,內存循環就可以一次與這10行進行比較, 這樣只需要比較10次,對內部表的掃描減少了9/10。所以BNL算法就能夠顯著減少內層循環表掃描的次數。
前面描述的query, 如果使用join buffer, 那么實際join示意如下:
for each row in t1 matching range { for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } empty buffer } } } if buffer is not empty { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } }
如果t1, t2參與join的列長度只和為s, c為二者組合數, 那么t3表被掃描的次數為
(S * C)/join_buffer_size + 1
掃描t3的次數隨着join_buffer_size的增大而減少, 直到join buffer能夠容納所有的t1, t2組合, 再增大join buffer size, query 的速度就不會再變快了。
MySQL使用Join Buffer有以下要點:
1. join_buffer_size變量決定buffer大小。 2. 只有在join類型為all, index, range的時候才可以使用join buffer。 3. 能夠被buffer的每一個join都會分配一個buffer, 也就是說一個query最終可能會使用多個join buffer。 4. 第一個nonconst table不會分配join buffer, 即便其掃描類型是all或者index。 5. 在join之前就會分配join buffer, 在query執行完畢即釋放。 6. join buffer中只會保存參與join的列, 並非整個數據行。
5.6版本及以后,優化器管理參數optimizer_switch中中的block_nested_loop參數控制着BNL是否被用於優化器。默認條件下是開啟,若果設置為off,優化器在選擇 join方式的時候會選擇NLJ算法