1.監控日志
通過監控發現如下異常,尾隨其后的還有報錯相應的堆棧信息,指出了具體是哪個SQL語句發生了死鎖
com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at com.***.***.im.service.platform.dao.impl.ImMessageDaoImpl.insert(ImMessageDaoImpl.java:50)
at com.***.***.im.service.platform.service.impl.ImMessageServiceImpl.saveNewSessionMessage(ImMessageServiceImpl.java:543)
通過日志查看代碼,覺得不大可能是同一個事務並發執行導致的死鎖
2.查看隔離級別
select @@tx_isolation; //當前session隔離級別
select @@global.tx_isolation; //全局回話隔離級別
業務代碼有可能使用默認的隔離級別,默認的級別就是全局的隔離級別;業務也可能設置了當前事物的隔離級別,我們使用的默認級別,是RR(可重復讀)
3.查看最近一次innoDB監測的死鎖
聯系DBA,查看發生死鎖的業務對應的數據庫,和innodb記錄的死鎖日志
show engine innodb status;
查詢得到最近的一次死鎖日志為:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-04-01 23:32:49 0x7f6306adb700
*** (1) TRANSACTION:
TRANSACTION 23734694036, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1136, 25 row lock(s)
MySQL thread id 7109502, OS thread handle 140046693021440, query id 5270358204 172.31.21.66 im_w1 updating
update im_servicer_session
set unread_count=0
where session_id=142298 and servicer_id=8708
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5351 page no 18 n bits 224 index PRIMARY of table `im`.`im_servicer_session` trx id 23734694036
lock_mode X locks rec but not gap waiting
Record lock, heap no 148 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
0: len 8; hex 00000000000006a4; asc ;;
1: len 6; hex 000586b2b07f; asc ;;
2: len 7; hex 27000002141d37; asc ' 7;;
3: len 8; hex 0000000000022bda; asc + ;;
4: len 8; hex 0000000000002204; asc " ;;
5: len 1; hex 00; asc ;;
6: len 5; hex 9943c20000; asc C ;;
7: len 1; hex 00; asc ;;
8: len 4; hex 00000003; asc ;;
9: len 5; hex 99a2c37642; asc vB;;
10: len 5; hex 99a2c37830; asc x0;;
*** (2) TRANSACTION:
TRANSACTION 23734694015, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 7108183, OS thread handle 140063290537728, query id 5270358482 172.31.35.143 im_w1 update
insert into im_message_0_34
( chat_id,
message_type,
message,
house_id,
send_time,
send_status,
receive_status,
show_type )
values ( '4NzP0DZO7wngS5YiGFcJTKu0L2Xrhan7zpbBBO/1KdQ=',
0,
'嗯嗯',
106874,
'2019-04-01 23:32:48.113',
0,
1,
0 )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 5351 page no 18 n bits 224 index PRIMARY of table `im`.`im_servicer_session` trx id 23734694015
lock_mode X locks rec but not gap
Record lock, heap no 148 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
0: len 8; hex 00000000000006a4; asc ;;
1: len 6; hex 000586b2b07f; asc ;;
2: len 7; hex 27000002141d37; asc ' 7;;
3: len 8; hex 0000000000022bda; asc + ;;
4: len 8; hex 0000000000002204; asc " ;;
5: len 1; hex 00; asc ;;
6: len 5; hex 9943c20000; asc C ;;
7: len 1; hex 00; asc ;;
8: len 4; hex 00000003; asc ;;
9: len 5; hex 99a2c37642; asc vB;;
10: len 5; hex 99a2c37830; asc x0;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5388 page no 1531 n bits 264 index idx_chat_id of table `im`.`im_message_0_34` trx id 23734694015
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 110 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 344f69384254415559786c496a483947657577705071365a3764794f546e; asc 4Oi8BTAUYxlIjH9GeuwpPq6Z7dyOTn; (total 44 bytes);
1: len 8; hex 00000000000069a0; asc i ;;
*** WE ROLL BACK TRANSACTION (2)
從日志中可以看到只是簡單的記錄排它鎖(X lock),並非間隙鎖(gap lock)。還能發現第一個事務阻塞在了更新會話的SQL語句中,經查詢得到是更新消息為已讀的SQL,第二個事務阻塞在了保存消息的SQL語句中,死鎖發生的兩個事務的代碼分別如下:
TRANSACTION 23734694036
//更新會話時間
imServicerSessionService.updateSessionTime(sessionVo.getSessionId(), EnumServicerSessionState.IN_SESSION);
//...時間較長的請求
if (md.getMessageId() != null && md.getMessageId() > 0) {
logger.info("修改消息");
imMessageDao.update(md);
}else{
imMessageDao.insert(md);
}
TRANSACTION 23734694015
if (LoginUserUtil.isServicer()) {
imMessageDao.markServicerMessageRead(chatId,baseSubTable.getTableName(),houseId, loginInfo.getAccountId());
imServicerSessionService.resetUnreadCount(imSessionVoList.get(0).getSessionId(), loginInfo.getAccountId());
}
4.會話過程
5.解決辦法
- 解決死鎖可以從死鎖發生的條件入手,最容易解決的就是更改獲取資源的順序,在這個案例中可以更改的是事務TRANSACTION 23734694015里面兩個SQL執行的順序,因為他們沒有依賴關系
- 其次是避免長事務,讓事務執行的時間盡可能少,讓事務的覆蓋范圍盡可能小,長事務會導致並發度降低,且會有更多的SQL查詢延遲
- 給整個方法加事務是否是必須的?可以不加事務的盡量不加