這次開發支付對帳時,持久化對帳數據時線上突然出現下面這個死鎖.
### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may involve com.imxiaomai.pay.service.reconcile.PayReconcileDao.insert-Inline
### The error occurred while setting parameters
### SQL: insert into pay_reconcile ( trade_time, shop_no, out_trade_no, out_order_no, total_fee, cost, system_id, payment_category, create_time, direction, refund_original_order_no ) select ?, ?, ?, ?, ?, ?, ?, ?, now(), ?, ? FROM dual WHERE NOT EXISTS ( SELECT 1 FROM pay_reconcile where out_trade_no=? AND `out_order_no`=? )
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
; SQL []; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction[com.imxiaomai.pay.service.reconcile.GenericReconcileHandler:run]
org.springframework.dao.DeadlockLoserDataAccessException:
### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
經過初步分析,這個SQL是單條數據插入,mysql是行級鎖,不應該發生死鎖.然后找DBA查看數據庫死鎖,找到如下死鎖語句.
018-01-18 11:03:15 2adb0dc40700TOO DEEP OR LONG SEARCH IN THE LOCK TABLE WAITS-FOR GRAPH, WE WILL ROLL BACK FOLLOWING TRANSACTION
*** TRANSACTION:
TRANSACTION 121106127, ACTIVE 0.041 sec setting auto-inc lock
mysql tables in use 2, locked 1
1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 1443086444, OS thread handle 0x2adb0dc40700, query id 195863600 10.171.134.40 bldhq executing
insert into pay_reconcile
( trade_time,
shop_no,
out_trade_no,
out_order_no,
total_fee,
cost,
system_id,
payment_category,
create_time,
direction,
refund_original_order_no )
select '2018-01-17 16:20:13.0',
'XM0003',
'4200000069201801175791162100',
'118301180117162014',
2120,
13,
1,
1,
now(),
1,
null
FROM dual
WHERE NOT EXISTS (
SELECT 1 FROM pay_reconcile where out_trade_no='4200000069201801175791162100'
AND `out_order_no`='118301180117162014'
)
*** WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `bldshop`.`pay_reconcile` trx id 121106127 lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (2)
產生原因:采用線程池(10個)插入數據時,自增列排隊過長導致事務排隊過長.而MYSQL自增列默認鎖表.
【1】innodb_autoinc_lock_mode 的說明
innodb_auto_lockmode有三個取值:
1、0 這個表示tradition 傳統
2、1 這個表示consecutive 連續
3、2 這個表示interleaved 交錯
【1.1】tradition(innodb_autoinc_lock_mode=0) 模式:
1、它提供了一個向后兼容的能力
2、在這一模式下,所有的insert語句("insert like") 都要在語句開始的時候得到一個
表級的auto_inc鎖,在語句結束的時候才釋放這把鎖,注意呀,這里說的是語句級而不是事務級的,
一個事務可能包涵有一個或多個語句。
3、它能保證值分配的可預見性,與連續性,可重復性,這個也就保證了insert語句在復制到slave
的時候還能生成和master那邊一樣的值(它保證了基於語句復制的安全)。
4、由於在這種模式下auto_inc鎖一直要保持到語句的結束,所以這個就影響到了並發的插入。
【1.2】consecutive(innodb_autoinc_lock_mode=1) 模式:
1、這一模式下去simple insert 做了優化,由於simple insert一次性插入值的個數可以立馬得到
確定,所以mysql可以一次生成幾個連續的值,用於這個insert語句;總的來說這個對復制也是安全的
(它保證了基於語句復制的安全)
2、這一模式也是mysql的默認模式,這個模式的好處是auto_inc鎖不要一直保持到語句的結束,只要
語句得到了相應的值后就可以提前釋放鎖
【1.3】interleaved(innodb_autoinc_lock_mode=2) 模式
1、由於這個模式下已經沒有了auto_inc鎖,所以這個模式下的性能是最好的;但是它也有一個問題,就是
對於同一個語句來說它所得到的auto_incremant值可能不是連續的。
【2】如果你的二進制文件格式是mixed | row 那么這三個值中的任何一個對於你來說都是復制安全的。
由於現在mysql已經推薦把二進制的格式設置成row,所以在binlog_format不是statement的情況下最
好是innodb_autoinc_lock_mode=2 這樣可能知道更好的性能。
解決方法:
1.服務程序生成唯一ID,不使用數據庫自增ID.(采用twitter-的snowflake算法.源於UUID變種)
2.調整線程池大小,由10個變成2個.減少數據庫壓力..
3.待線上驗證....(線上驗證一周,未發現死鎖.)
4.部署時需要配置啟動參數... -Dinstance.group 與 -Dinstance.id參數,且這兩個值的取值范圍為0-31.
1 public class Snowflake implements SerialNumber { 2 // ==============================Fields=========================================== 3 /** 開始時間截 (2015-01-01) */ 4 private final long twepoch = 1420041600000L; 5 6 /** 機器id所占的位數 */ 7 private final long workerIdBits = 5L; 8 9 /** 數據標識id所占的位數 */ 10 private final long datacenterIdBits = 5L; 11 12 /** 支持的最大機器id,結果是31 (這個移位算法可以很快的計算出幾位二進制數所能表示的最大十進制數) */ 13 private final long maxWorkerId = -1L ^ (-1L << workerIdBits); 14 15 /** 支持的最大數據標識id,結果是31 */ 16 private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); 17 18 /** 序列在id中占的位數 */ 19 private final long sequenceBits = 12L; 20 21 /** 機器ID向左移12位 */ 22 private final long workerIdShift = sequenceBits; 23 24 /** 數據標識id向左移17位(12+5) */ 25 private final long datacenterIdShift = sequenceBits + workerIdBits; 26 27 /** 時間截向左移22位(5+5+12) */ 28 private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; 29 30 /** 生成序列的掩碼,這里為4095 (0b111111111111=0xfff=4095) */ 31 private final long sequenceMask = -1L ^ (-1L << sequenceBits); 32 33 /** 工作機器ID(0~31) */ 34 private long workerId; 35 36 /** 數據中心ID(0~31) */ 37 private long datacenterId; 38 39 /** 毫秒內序列(0~4095) */ 40 private long sequence = 0L; 41 42 /** 上次生成ID的時間截 */ 43 private long lastTimestamp = -1L; 44 45 private static Snowflake singleInstance; 46 private static final String INSTANCE_GROUP="instance.group"; 47 48 private static final String INSTANCE_ID="instance.id"; 49 public static Snowflake create(){ 50 if(null==singleInstance){ 51 synchronized (Snowflake.class){ 52 if(null==singleInstance){ 53 String group=System.getProperty(INSTANCE_GROUP); 54 String id=System.getProperty(INSTANCE_ID); 55 if(StringUtils.isBlank(group)){ 56 throw new RuntimeException("instance.group must be exist and great -1 and less 32"); 57 } 58 if(StringUtils.isBlank(id)){ 59 throw new RuntimeException("instance.id must be exist and great -1 and less 32"); 60 } 61 if(!NumberUtils.isDigits(group)){ 62 throw new RuntimeException("instance.group must be exist and great -1 and less 32"); 63 } 64 if(!NumberUtils.isDigits(id)){ 65 throw new RuntimeException("instance.id must be exist and great -1 and less 32"); 66 } 67 int g=Integer.valueOf(group); 68 int i=Integer.valueOf(id); 69 if(g<0||g>=32){ 70 throw new RuntimeException("instance.group must be exist and great -1 and less 32"); 71 } 72 if(i<0||i>=32){ 73 throw new RuntimeException("instance.id must be exist and great -1 and less 32"); 74 } 75 singleInstance=new Snowflake(i,g); 76 } 77 } 78 } 79 return singleInstance; 80 } 81 //==============================Constructors===================================== 82 /** 83 * 構造函數 84 * @param workerId 工作ID (0~31) 85 * @param datacenterId 數據中心ID (0~31) 86 */ 87 private Snowflake(long workerId, long datacenterId) { 88 89 90 if (workerId > maxWorkerId || workerId < 0) { 91 throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId)); 92 } 93 if (datacenterId > maxDatacenterId || datacenterId < 0) { 94 throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId)); 95 } 96 this.workerId = workerId; 97 this.datacenterId = datacenterId; 98 } 99 100 // ==============================Methods========================================== 101 /** 102 * 獲得下一個ID (該方法是線程安全的) 103 * @return SnowflakeId 104 */ 105 @Override 106 public synchronized long nextId() { 107 long timestamp = timeGen(); 108 109 //如果當前時間小於上一次ID生成的時間戳,說明系統時鍾回退過這個時候應當拋出異常 110 if (timestamp < lastTimestamp) { 111 throw new RuntimeException( 112 String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp)); 113 } 114 115 //如果是同一時間生成的,則進行毫秒內序列 116 if (lastTimestamp == timestamp) { 117 sequence = (sequence + 1) & sequenceMask; 118 //毫秒內序列溢出 119 if (sequence == 0) { 120 //阻塞到下一個毫秒,獲得新的時間戳 121 timestamp = tilNextMillis(lastTimestamp); 122 } 123 } 124 //時間戳改變,毫秒內序列重置 125 else { 126 sequence = 0L; 127 } 128 129 //上次生成ID的時間截 130 lastTimestamp = timestamp; 131 132 //移位並通過或運算拼到一起組成64位的ID 133 return ((timestamp - twepoch) << timestampLeftShift) // 134 | (datacenterId << datacenterIdShift) // 135 | (workerId << workerIdShift) // 136 | sequence; 137 } 138 139 /** 140 * 阻塞到下一個毫秒,直到獲得新的時間戳 141 * @param lastTimestamp 上次生成ID的時間截 142 * @return 當前時間戳 143 */ 144 protected long tilNextMillis(long lastTimestamp) { 145 long timestamp = timeGen(); 146 while (timestamp <= lastTimestamp) { 147 timestamp = timeGen(); 148 } 149 return timestamp; 150 } 151 152 /** 153 * 返回以毫秒為單位的當前時間 154 * @return 當前時間(毫秒) 155 */ 156 protected long timeGen() { 157 return System.currentTimeMillis(); 158 } 159 160 public static void main(String[] args) { 161 Snowflake snowflake=new Snowflake(1,1); 162 ExecutorService executorService= Executors.newFixedThreadPool(50); 163 Map<Long,Integer> map=new ConcurrentHashMap<> (1000000); 164 Integer tag=Integer.valueOf(0); 165 for (long i=0;i<1000000;++i) { 166 executorService.execute(new Runnable() { 167 @Override 168 public void run() { 169 Long val=Long.valueOf(snowflake.nextId()); 170 map.put(val,tag); 171 System.out.println(val); 172 } 173 }); 174 } 175 System.out.println(map.size()); 176 try { 177 System.in.read(); 178 while (1000000!=map.size()){ 179 System.out.println(map.size()); 180 } 181 System.out.println(map.size()); 182 executorService.shutdown(); 183 System.in.read(); 184 185 }catch (Throwable throwable){ 186 System.out.println(throwable.getMessage()); 187 } 188 } 189 }