MySQL表結構變更,不可不知的Metadata Lock


在線上進行DDL操作時,相對於其可能帶來的系統負載,其實,我們最擔心的還是MDL其可能導致的阻塞問題。

一旦DDL操作因獲取不到MDL被阻塞,后續其它針對該表的其它操作都會被阻塞。典型如下,如阻塞稍久的話,我們會看到Threads_running飆升,CPU告警。

mysql> show processlist;
+----+-----------------+-----------+-----------+---------+------+---------------------------------+------------------------------------+
| Id | User            | Host      | db        | Command | Time | State                           | Info                               |
+----+-----------------+-----------+-----------+---------+------+---------------------------------+------------------------------------+
|  4 | event_scheduler | localhost | NULL      | Daemon  |  122 | Waiting on empty queue          | NULL                               |
|  9 | root            | localhost | NULL      | Sleep   |   57 |                                 | NULL                               |
| 12 | root            | localhost | employees | Query   |   40 | Waiting for table metadata lock | alter table slowtech.t1 add c1 int |
| 13 | root            | localhost | employees | Query   |   35 | Waiting for table metadata lock | select * from slowtech.t1          |
| 14 | root            | localhost | employees | Query   |   30 | Waiting for table metadata lock | select * from slowtech.t1          |
| 15 | root            | localhost | employees | Query   |   19 | Waiting for table metadata lock | select * from slowtech.t1          |
| 16 | root            | localhost | employees | Query   |   10 | Waiting for table metadata lock | select * from slowtech.t1          |
| 17 | root            | localhost | employees | Query   |    0 | starting                        | show processlist                   |
+----+-----------------+-----------+-----------+---------+------+---------------------------------+------------------------------------+
8 rows in set (0.00 sec)

如果發生在線上,無疑會影響到業務。所以,一般建議將DDL操作放到業務低峰期做,其實有兩方面的考慮,1. 避免對系統負載產生較大影響。2. 減少DDL被阻塞的概率。

 

MDL引入的背景

MDL是MySQL 5.5.3引入的,主要用於解決兩個問題,

 

RR事務隔離級別下不可重復讀的問題

如下所示,演示環境,MySQL 5.5.0。

session1> begin;
Query OK, 0 rows affected (0.00 sec)

session1> select * from t1;
+------+------+
| id  | name |
+------+------+
|    1 | a    |
|    2 | b    |
+------+------+
2 rows in set (0.00 sec)

session2> alter table t1 add c1 int;
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

session1> select * from t1;
Empty set (0.00 sec)

session1> commit;
Query OK, 0 rows affected (0.00 sec)

session1> select * from t1;
+------+------+------+
| id  | name | c1  |
+------+------+------+
|    1 | a    | NULL |
|    2 | b    | NULL |
+------+------+------+
2 rows in set (0.00 sec)

可以看到,雖然是RR隔離級別,但在開啟事務的情況下,第二次查詢卻沒有結果。

 

主從復制問題

包括主從數據不一致,主從復制中斷等。
如下面的主從數據不一致。

session1> create table t1(id int,name varchar(10)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)

session1> begin;
Query OK, 0 rows affected (0.00 sec)

session1> insert into t1 values(1,'a');
Query OK, 1 row affected (0.00 sec)

session2> truncate table t1;
Query OK, 0 rows affected (0.46 sec)

session1> commit;
Query OK, 0 rows affected (0.35 sec)

session1> select * from t1;
Empty set (0.00 sec)

 

再來看看從庫的結果

session1> select * from slowtech.t1;
+------+------+------+
| id   | name | c1   |
+------+------+------+
|    1 | a    | NULL |
+------+------+------+
1 row in set (0.00 sec)

 

看看binlog的內容,可以看到,truncate操作記錄在前,insert操作記錄在后。

# at 7140
#180714 19:32:14 server id 1  end_log_pos 7261    Query    thread_id=31    exec_time=0    error_code=0
SET TIMESTAMP=1531567934/*!*/;
create table t1(id int,name varchar(10)) engine=innodb
/*!*/;

# at 7261
#180714 19:32:30 server id 1  end_log_pos 7333    Query    thread_id=32    exec_time=0    error_code=0
SET TIMESTAMP=1531567950/*!*/;
BEGIN
/*!*/;
# at 7333
#180714 19:32:30 server id 1  end_log_pos 7417    Query    thread_id=32    exec_time=0    error_code=0
SET TIMESTAMP=1531567950/*!*/;
truncate table t1
/*!*/;
# at 7417
#180714 19:32:30 server id 1  end_log_pos 7444    Xid = 422
COMMIT/*!*/;

# at 7444
#180714 19:32:34 server id 1  end_log_pos 7516    Query    thread_id=31    exec_time=0    error_code=0
SET TIMESTAMP=1531567954/*!*/;
BEGIN
/*!*/;
# at 7516
#180714 19:32:24 server id 1  end_log_pos 7611    Query    thread_id=31    exec_time=0    error_code=0
SET TIMESTAMP=1531567944/*!*/;
insert into t1 values(1,'a')
/*!*/;
# at 7611
#180714 19:32:34 server id 1  end_log_pos 7638    Xid = 421
COMMIT/*!*/;

 

如果會話2執行的是drop table操作,還會導致主從中斷。

有意思的是,如果會話2執行的是alter table操作,其依舊會被阻塞,阻塞時間受innodb_lock_wait_timeout參數限制。

mysql> show processlist;
+----+------+-----------+----------+---------+------+-------------------+---------------------------+
| Id | User | Host      | db       | Command | Time | State             | Info                      |
+----+------+-----------+----------+---------+------+-------------------+---------------------------+
| 54 | root | localhost | NULL     | Query   |    0 | NULL              | show processlist          |
| 58 | root | localhost | slowtech | Sleep   | 1062 |                   | NULL                      |
| 60 | root | localhost | slowtech | Query   |   11 | copy to tmp table | alter table t1 add c1 int |
+----+------+-----------+----------+---------+------+-------------------+---------------------------+
3 rows in set (0.00 sec)

 

MDL的基本概念

首先,看看官方的說法,

To ensure transaction serializability, the server must not permit one session to perform a data definition language (DDL) statement on a table that is used in an uncompleted explicitly or implicitly started transaction in another session.

The server achieves this by acquiring metadata locks on tables used within a transaction and deferring release of those locks until the transaction ends.

A metadata lock on a table prevents changes to the table's structure.

This locking approach has the implication that a table that is being used by a transaction within one session cannot be used in DDL statements by other sessions until the transaction ends.

 

從上面的描述可以看到,

1. MDL出現的初衷就是為了保護一個處於事務中的表的結構不被修改。

2. 這里提到的事務包括兩類,顯式事務和AC-NL-RO(auto-commit non-locking read-only)事務。顯式事務包括兩類:1. 關閉AutoCommit下的操作,2. 以begin或start transaction開始的操作。AC-NL-RO可理解為AutoCommit開啟下的select操作。

3. MDL是事務級別的,只有在事務結束后才會釋放。在此之前,其實也有類似的保護機制,只不過是語句級別的。

 

需要注意的是,MDL不僅僅適用於表,同樣也適用於其它對象,如下表所示,其中,"等待狀態"對應的是"show processlist"中的State。

 

 

為了提高數據庫的並發度,MDL被細分為了11種類型。

  • MDL_INTENTION_EXCLUSIVE

  • MDL_SHARED

  • MDL_SHARED_HIGH_PRIO

  • MDL_SHARED_READ

  • MDL_SHARED_WRITE

  • MDL_SHARED_WRITE_LOW_PRIO

  • MDL_SHARED_UPGRADABLE

  • MDL_SHARED_READ_ONLY

  • MDL_SHARED_NO_WRITE

  • MDL_SHARED_NO_READ_WRITE

  • MDL_EXCLUSIVE

常用的有MDL_SHARED_READ,MDL_SHARE D_WRITE及MDL_EXCLUSIVE,其分別用於SELECT操作,DML操作及DDL操作。其它類型的對應操作可參考源碼sql/mdl.h。

 

對於MDL_EXCLUSIVE,官方的解釋是,

  /*
    An exclusive metadata lock.
    A connection holding this lock can modify both table's metadata and data.
    No other type of metadata lock can be granted while this lock is held.
    To be used for CREATE/DROP/RENAME TABLE statements and for execution of
    certain phases of other DDL statements.
  */

簡而言之,MDL_EXCLUSIVE是獨占鎖,在其持有期間是不允許其它類型的MDL被授予,自然也包括SELECT和DML操作。

這也就是為什么DDL操作被阻塞時,后續其它操作也會被阻塞。

 

關於MDL的補充

1. MDL的最大等待時間由lock_wait_timeout參數決定,其默認值為31536000(365天)。在使用工具進行DDL操作時,這個值就不太合理。事實上,pt-online-schema-change和gh-ost對其就進行了相應的調整,其中,前者60s,后者3s。

2. 如果一個SQL語法上有效,但執行時報錯,如,列名不存在,其同樣會獲取MDL鎖,直到事務結束才釋放。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM