最近項目中某個模塊穩定復現MySQL死鎖問題,本文記錄死鎖的發生原因以及解決辦法。
1. 預備知識
1.1 表鎖和行鎖
- 表鎖
表鎖是MySQL中最基本的鎖策略,並且是開銷最小的策略。表鎖會鎖定整張數據表,用戶的寫操作(插入/刪除/更新)前,都需要獲取寫鎖(寫鎖會相互阻塞);沒有寫鎖時,讀取用戶才能獲取讀鎖(讀鎖不會相互阻塞)。
- 行鎖(僅限定於InnoDB)
行級鎖可以最大程度的支持並發處理(同時也帶來了最大的鎖開銷)。行級鎖只在存儲引擎實現,而MySQL服務器層沒有實現。服務器層完全不了解存儲引擎中的具體實現。
1.2 行鎖簡介
行鎖的模式有:讀/寫意向鎖(IS/IX鎖),讀鎖(S鎖),寫鎖(X鎖)以及自增鎖(AI)。
行鎖根據場景的不同又可以進一步細分,依次為Next-Key Lock,Gap Lock間隙鎖,Record Lock記錄鎖和插入意向GAP鎖。不同的鎖鎖定的位置是不同的,比如說記錄鎖只鎖住對應的記錄,而間隙鎖鎖住記錄和記錄之間的間隔,Next-Key Lock鎖住記錄和記錄之前的間隙。不同類型鎖的鎖定范圍大致如下圖所示。
此外,鎖對應的死鎖日志信息標記如下所示:
- 記錄鎖(LOCK_REC_NOT_GAP): lock_mode X locks rec but not gap
- 間隙鎖(LOCK_GAP): lock_mode X locks gap before rec
- Next-key鎖(LOCK_ORNIDARY): lock_mode X
- 插入意向鎖(LOCK_INSERT_INTENTION): lock_mode X locks gap before rec insert intention
1.3 行鎖加鎖示例
InnoDB是聚簇索引,也就是B+樹的葉子節點存儲了主鍵索引以及數據行;InnoDB的二級索引的葉子節點存儲的則是主鍵值,所以通過二級索引查詢數據時,需要根據查詢到的主鍵去聚簇索引中再次進行查詢。
update user set age = 10 where id = 49;
update user set age = 10 where name = 'Tom';
(1)第一條SQL使用主鍵進行查詢,則只需要在id=49
主鍵上加上寫鎖(X鎖);
(2)第二條SQL使用二級索引查詢,首先在name='Tom'
上加寫鎖,然后根據獲取的主鍵索引查詢,在id=49
主鍵上添加寫鎖。
具體如下圖所示:
以上是基於單條數據討論,針對多條數據:
update user set age = 10 where id > 49;
執行步驟:
(1)MySQL Server根據where條件讀取滿足條件的第一條記錄,InnoDB引擎返回行記錄並加鎖;
(2)MySQL Server發起更新行記錄的update請求,更新此記錄;
(3)反復循環(1)(2)步驟,直到所有滿足條件的記錄均被修改。
具體如下圖所示:
2.准備工作
2.1 創建數據表並初始化
create table dead_lock_test
(
id int auto_increment
primary key,
v1 int not null,
v2 int not null
);
insert into dead_lock_test (v1,v2) value (1,1);
insert into dead_lock_test (v1,v2) value (2,2);
insert into dead_lock_test (v1,v2) value (3,3);
需要注意,數據表中僅存在主鍵索引。此外,默認數據庫引擎為InnoDB,事務隔離級別為RR(可重復讀,相對於RC解決了幻讀)。
2.2 開啟鎖監控
使用如下語句,開啟MySQL鎖監控:
# 開啟
set GLOBAL innodb_status_output=ON;
set GLOBAL innodb_status_output_locks=ON;
# 關閉
set GLOBAL innodb_status_output_locks=OFF;
3.場景復現
開啟兩個數據庫連接,分別執行如下SQL語句:
# session1
start transaction ;
insert into dead_lock_test (v1,v2) value (4,4);
delete from dead_lock_test where v1 = 4 and v2 = 4;
commit;
# session2
start transaction;
insert into dead_lock_test (v1,v2) value (5,5);
delete from dead_lock_test where v1 = 5 and v2 = 5;
commit;
不要問事務里就兩條SQL,插入后刪除走回滾就可以了之類的問題(我也不知道為什么這么寫的)。
事務執行步驟如下表所示:
session1 | session2 | stage | 備注 |
---|---|---|---|
start transaction; |
start transaction; |
||
insert into dead_lock_test (v1,v2) value (4,4); |
do nothing | 執行成功 | |
do nothing | insert into dead_lock_test (v1,v2) value (5,5); |
stage1 | 執行成功 |
delete from dead_lock_test where v1 = 4 and v2 = 4; |
do nothing | stage2 | session1執行結果阻塞 |
do nothing | delete from dead_lock_test where v1 = 5 and v2 = 5; |
stage3 | session2執行結果報deadlock |
3.1 stage1
執行show engine innodb status;
節選事務信息如下所示:
------------
TRANSACTIONS
------------
Trx id counter 91328
Purge done for trx's n:o < 91327 undo n:o < 0 state: running but idle
History list length 19
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 91327, ACTIVE 37 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 24, OS thread handle 15668, query id 3147 localhost 127.0.0.1 root
TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock mode IX
---TRANSACTION 91322, ACTIVE 44 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 23, OS thread handle 22788, query id 3103 localhost 127.0.0.1 root
TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock mode IX
輸出內容中節選當前事務信息,當前存在兩個運行中事務,trx id分別為91322以及91327。
TABLE LOCK table
igw_proxy_rule_management.
dead_lock_testtrx id 91322 lock mode IX
:dead_lock_test
表上添加IX鎖。
91322事務對應session1,91327事務對應session2.
3.2 stage2
執行delete from dead_lock_test where v1 = 4 and v2 = 4;
后可發現,當前事務被阻塞。
執行show engine innodb status;
節選事務信息如下所示:
因為輸出內容較多,直接在輸出內容中添加注釋進行解析。
------------
TRANSACTIONS
------------
Trx id counter 91332
Purge done for trx's n:o < 91332 undo n:o < 0 state: running but idle
History list length 21
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 91327, ACTIVE 58 sec
* 2 lock strcut(s): 事務91327中鎖鏈表長度為2(每個鏈表節點表示該事務持有的一個鎖結構,包括表鎖/記錄鎖等),當前事務包含表鎖(IX)以及一個行鎖(記錄鎖);
* 1 row lock(s):當前事務持有的行鎖個數;
* undo log entries 1:當前事務的undo log個數
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 24, OS thread handle 15668, query id 3147 localhost 127.0.0.1 root
* TABLE LOCK:當前事務持有的表鎖(IX)
TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock mode IX
* RECORD LOCKS:當前事務持有的行鎖(lock_mode X locks rec but not gap)
* space id 92: dead_lock_test所在空間編號
* page no 4: 當前記錄所在頁碼
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock_mode X locks rec but not gap
* 行鎖信息: heap no=6
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;; * hex 80000005:當前加鎖的記錄id=5
1: len 6; hex 0000000164bf; asc d ;; * hex 0000000164bf: 事務ID;
2: len 7; hex 81000000b20110; asc ;; * hex 81000000b20110: 回滾指針;
3: len 4; hex 80000005; asc ;; * hex 80000005: v1字段對應數值;
4: len 4; hex 80000005; asc ;; * hex 80000005:v2字段對應數值;
---TRANSACTION 91322, ACTIVE 65 sec fetching rows
* tables in use 1: 有1個表正在被使用;
* locked 1: 有一個表鎖
mysql tables in use 1, locked 1
* LOCK WAIT:事務91322處於鎖等待狀態;其他字段解釋詳見上問
LOCK WAIT 5 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2
MySQL thread id 23, OS thread handle 22788, query id 3199 localhost 127.0.0.1 root updating
* 事務91322當前執行SQL語句
/* ApplicationName=DataGrip 2021.1.1 */ delete from dead_lock_test where v1 = 4 and v2 = 4
* 事務91322等待的鎖信息
------- TRX HAS BEEN WAITING 8 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X waiting
* 事務91322等待的記錄鎖(鎖對應記錄主鍵為5,被事務91327持有)
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 0000000164bf; asc d ;;
2: len 7; hex 81000000b20110; asc ;;
3: len 4; hex 80000005; asc ;;
4: len 4; hex 80000005; asc ;;
------------------
* 以下展示事務91322所持有的鎖以及嘗試獲取的鎖,首先是表意向鎖(IX鎖)
TABLE LOCK table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock mode IX
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X
* 記錄鎖(鎖對應記錄主鍵為1)
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 0000000164b3; asc d ;;
2: len 7; hex 81000000ad0110; asc ;;
3: len 4; hex 80000001; asc ;;
4: len 4; hex 80000001; asc ;;
* 記錄鎖(鎖對應記錄主鍵為2)
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000002; asc ;;
1: len 6; hex 0000000164b4; asc d ;;
2: len 7; hex 82000000ad0110; asc ;;
3: len 4; hex 80000002; asc ;;
4: len 4; hex 80000002; asc ;;
* 記錄鎖(鎖對應記錄主鍵為3)
Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000003; asc ;;
1: len 6; hex 0000000164b9; asc d ;;
2: len 7; hex 81000000b00110; asc ;;
3: len 4; hex 80000003; asc ;;
4: len 4; hex 80000003; asc ;;
* 記錄鎖:鎖定記錄(添加記錄時創建的鎖)
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
0: len 4; hex 80000004; asc ;;
1: len 6; hex 0000000164ba; asc d ;;
2: len 7; hex 020000011a03cb; asc ;;
3: len 4; hex 80000004; asc ;;
4: len 4; hex 80000004; asc ;;
* 間隙鎖:鎖定記錄(刪除記錄時創建的鎖,在RR模式下生效,主要解決幻讀)
* 需要注意,InnoDB的刪除記錄不是物理刪除,而是標記刪除(等待后續記錄覆蓋),因此可理解刪除類似於更新操作
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X locks gap before rec
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
0: len 4; hex 80000004; asc ;;
1: len 6; hex 0000000164ba; asc d ;;
2: len 7; hex 020000011a03cb; asc ;;
3: len 4; hex 80000004; asc ;;
4: len 4; hex 80000004; asc ;;
* 事務91322嘗試獲取的鎖(被事務91327持有)
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 0000000164bf; asc d ;;
2: len 7; hex 81000000b20110; asc ;;
3: len 4; hex 80000005; asc ;;
4: len 4; hex 80000005; asc ;;
(1)由以上注釋可知,事務91322在嘗試刪除時,會對表中所有記錄添加記錄鎖。
這是因為當前刪除記錄條件為v1 = 4 and v2 = 4
,在v1與v2字段上,並未建立相應的索引。
因為無法通過索引確定主鍵,導致MySQL Server會先嘗試鎖定當前dead_lock_test
表中所有記錄添加記錄鎖(可以設置參數進行優化,根據where條件逐漸解除不滿足條件記錄上的記錄鎖)。
(2)事務91322嘗試對dead_lock_test表中所有記錄添加鎖,發現記錄(id=5)已經被事務91327添加記錄鎖,導致事務91322只能等待事務91327放棄記錄鎖。
3.3 stage3
執行delete from dead_lock_test where v1 = 5 and v2 = 5;
后即可發現終端輸出:
[2021-05-13 15:33:29] [40001][1213] Deadlock found when trying to get lock; try restarting transaction
執行show engine innodb status;
節選死鎖信息如下所示:
因為內容較多,不再列出解釋,詳見輸出信息中文注釋部分
------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-05-13 17:27:09 0xca4
*** (1) TRANSACTION:
* 事務91322持有鎖情況,在stage2已經詳細解釋,此處不再贅述
TRANSACTION 91322, ACTIVE 78 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2
MySQL thread id 23, OS thread handle 22788, query id 3199 localhost 127.0.0.1 root updating
/* ApplicationName=DataGrip 2021.1.1 */ delete from dead_lock_test where v1 = 4 and v2 = 4
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 0000000164b3; asc d ;;
2: len 7; hex 81000000ad0110; asc ;;
3: len 4; hex 80000001; asc ;;
4: len 4; hex 80000001; asc ;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000002; asc ;;
1: len 6; hex 0000000164b4; asc d ;;
2: len 7; hex 82000000ad0110; asc ;;
3: len 4; hex 80000002; asc ;;
4: len 4; hex 80000002; asc ;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000003; asc ;;
1: len 6; hex 0000000164b9; asc d ;;
2: len 7; hex 81000000b00110; asc ;;
3: len 4; hex 80000003; asc ;;
4: len 4; hex 80000003; asc ;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91322 lock_mode X waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 0000000164bf; asc d ;;
2: len 7; hex 81000000b20110; asc ;;
3: len 4; hex 80000005; asc ;;
4: len 4; hex 80000005; asc ;;
*** (2) TRANSACTION:
TRANSACTION 91327, ACTIVE 71 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 24, OS thread handle 15668, query id 3237 localhost 127.0.0.1 root updating
/* ApplicationName=DataGrip 2021.1.1 */ delete from dead_lock_test where v1 = 5 and v2 = 5
*** (2) HOLDS THE LOCK(S):
* 事務91327持有記錄(id=5)的記錄鎖,此鎖正在被事務91322等待持有
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 0000000164bf; asc d ;;
2: len 7; hex 81000000b20110; asc ;;
3: len 4; hex 80000005; asc ;;
4: len 4; hex 80000005; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
* 事務91327等待持有記錄鎖,鎖信息見后續解釋
RECORD LOCKS space id 92 page no 4 n bits 72 index PRIMARY of table `igw_proxy_rule_management`.`dead_lock_test` trx id 91327 lock_mode X waiting
* 事務91327等待持有記錄(id=1)的記錄鎖(delete無法走索引查詢,因此會嘗試對所有表記錄進行加鎖,但是事務91322持有id=1/2/3/4的記錄鎖,死鎖條件構成)
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 0000000164b3; asc d ;;
2: len 7; hex 81000000ad0110; asc ;;
3: len 4; hex 80000001; asc ;;
4: len 4; hex 80000001; asc ;;
*** WE ROLL BACK TRANSACTION (2)
由上可知:
(1)事務91322執行刪除操作時,嘗試獲取表中所有記錄的記錄鎖,其中記錄(id=5)的鎖被事務91327持有;
(2)事務91327執行刪除操作時,嘗試獲取表中所有記錄的記錄鎖,發現記錄(id=1/2/3/4)的鎖被事務91322持有;
(3)至此,事務91322與事務91327構成互相等待,死鎖形成。
解決方案
4.1 添加索引
由上述的分析可知,刪除時因為where條件無法利用索引,導致MySQL會嘗試對表中所有記錄加記錄鎖,產生死鎖。
我們僅需在v1以及v2字段上建立聯合索引,縮小記錄沖突范圍。
create index dead_lock_test_v2_v1_index on dead_lock_test (v1, v2);
此處沒有設置唯一索引,如果多個事務根據索引查詢,鎖定的記錄存在重疊,也容易復現死鎖現象。
不過當前業務側的數據插入,可保證在短暫時間范圍內,不存在重疊記錄,且表中存在一些重復數據,因此不使用唯一索引。
4.2 最終
表中添加索引。事務中添加后再刪除,通過回滾實現。