InnoDB
storage engine. The MySQL XA implementation is based on the X/Open CAE document
Distributed Transaction Processing: The XA Specification.
Currently, among the MySQL Connectors, MySQL Connector/J 5.0.0 and higher supports XA directly, by means of a class interface that handles the XA SQL statement interface for you.
XA supports distributed transactions, that is, the ability to permit multiple separate transactional resources to participate in a global transaction. Transactional resources often are RDBMSs but may be other kinds of resources.
A global transaction involves several actions that are transactional in themselves, but that all must either complete successfully as a group, or all be rolled back as a group. In essence, this extends ACID properties “up a level” so that multiple ACID transactions can be executed in concert as components of a global operation that also has ACID properties. (However, for a distributed transaction, you must use the SERIALIZABLE
isolation level to achieve ACID properties. It is enough to use REPEATABLE READ
for a nondistributed transaction, but not for a distributed transaction.)
最重要的一點:使用MySQL中的XA實現分布式事務時必須使用serializable隔離級別。
The MySQL implementation of XA MySQL enables a MySQL server to act as a Resource Manager that handles XA transactions within a global transaction. A client program that connects to the MySQL server acts as the Transaction Manager.
The process for executing a global transaction uses two-phase commit (2PC). This takes place after the actions performed by the branches of the global transaction have been executed.
-
In the first phase, all branches are prepared. That is, they are told by the TM to get ready to commit. Typically, this means each RM that manages a branch records the actions for the branch in stable storage. The branches indicate whether they are able to do this, and these results are used for the second phase.
-
In the second phase, the TM tells the RMs whether to commit or roll back. If all branches indicated when they were prepared that they will be able to commit, all branches are told to commit. If any branch indicated when it was prepared that it will not be able to commit, all branches are told to roll back.
第一階段:為prepare階段,TM向RM發出prepare指令,RM進行操作,然后返回成功與否的信息給TM;
第二階段:為事務提交或者回滾階段,如果TM收到所有RM的成功消息,則TM向RM發出提交指令;不然則發出回滾指令;
XA transaction support is limited to the InnoDB
storage engine.(只有innodb支持XA分布式事務)
For "external XA" a MySQL server acts as a Resource Manager and client programs act as Transaction Managers. For "Internal XA", storage engines within a MySQL server act as RMs, and the server itself acts as a TM. Internal XA support is limited by the capabilities of individual storage engines. Internal XA is required for handling XA transactions that involve more than one storage engine. The implementation of internal XA requires that a storage engine support two-phase commit at the table handler level, and currently this is true only for InnoDB
.
MySQL中的XA實現分為:外部XA和內部XA;前者是指我們通常意義上的分布式事務實現;后者是指單台MySQL服務器中,Server層作為TM(事務協調者),而服務器中的多個數據庫實例作為RM,而進行的一種分布式事務,也就是MySQL跨庫事務;也就是一個事務涉及到同一條MySQL服務器中的兩個innodb數據庫(因為其它引擎不支持XA)。
3. 內部XA的額外功能
XA 將事務的提交分為兩個階段,而這種實現,解決了 binlog 和 redo log的一致性問題,這就是MySQL內部XA的第三種功能。
MySQL為了兼容其它非事物引擎的復制,在server層面引入了 binlog, 它可以記錄所有引擎中的修改操作,因而可以對所有的引擎使用復制功能;MySQL在4.x 的時候放棄redo的復制策略而引入binlog的復制(淘寶丁奇)。
2> InnoDB維持了狀態為Prepare的事務鏈表,將這些事務的xid和Binlog中記錄的xid做比較,如果在Binlog中存在,則提交,否則回滾事務。
將Binlog Group Commit的過程拆分成了三個階段:
1> flush stage 將各個線程的binlog從cache寫到文件中;
2> sync stage 對binlog做fsync操作(如果需要的話;最重要的就是這一步,對多個線程的binlog合並寫入磁盤);
3> commit stage 為各個線程做引擎層的事務commit(這里不用寫redo log,在prepare階段已寫)。每個stage同時只有一個線程在操作。(分成三個階段,每個階段的任務分配給一個專門的線程,這是典型的並發優化)
淘寶對binlog group commit進行了進一步的優化,其原理如下:
從XA恢復的邏輯我們可以知道,只要保證InnoDB Prepare的redo日志在寫Binlog前完成write/sync即可。因此我們對Group Commit的第一個stage的邏輯做了些許修改,大概描述如下:
Step1. InnoDB Prepare,記錄當前的LSN到thd中;
Step2. 進入Group Commit的flush stage;Leader搜集隊列,同時算出隊列中最大的LSN。
Step3. 將InnoDB的redo log write/fsync到指定的LSN (注:這一步就是redo log的組寫入。因為小於等於LSN的redo log被一次性寫入到ib_logfile[0|1])
Step4. 寫Binlog並進行隨后的工作(sync Binlog, InnoDB commit , etc)
也就是將 redo log的write/sync延遲到了 binlog group commit的 flush stage 之后,sync binlog之前。
通過延遲寫redo log的方式,顯式的為redo log做了一次組寫入(redo log group write),並減少了(redo log) log_sys->mutex的競爭。
也就是將 binlog group commit 對應的redo log也進行了 group write. 這樣binlog 和 redo log都進行了優化。
官方MySQL在5.7.6的代碼中引入了淘寶的優化,對應的Release Note如下:
When using InnoDB with binary logging enabled, concurrent transactions written in the InnoDB redo log are now grouped together before synchronizing to disk when innodb_flush_log_at_trx_commit is set to 1, which reduces the amount of synchronization operations. This can lead to improved performance.
5. XA參數 innodb_support_xa
Command-Line Format | --innodb_support_xa |
||
System Variable | Name | innodb_support_xa |
|
Variable Scope | Global, Session | ||
Dynamic Variable | Yes | ||
Permitted Values | Type | boolean |
|
Default | TRUE |
Enables InnoDB
support for two-phase commit(2PC) in XA transactions, causing an extra disk flush for transaction preparation. This setting is the default. The XA mechanism is used internally and is essential for any server that has its binary log turned on and is accepting changes to its data from more than one thread. If you turn it off, transactions can be written to the binary log in a different order from the one in which the live database is committing them. This can produce different data when the binary log is replayed in disaster recovery or on a replication slave. Do not turn it off on a replication master server unless you have an unusual setup where only one thread is able to change data.
For a server that is accepting data changes from only one thread, it is safe and recommended to turn off this option to improve performance forInnoDB
tables. For example, you can turn it off on replication slaves where only the replication SQL thread is changing data.
You can also turn off this option if you do not need it for safe binary logging or replication, and you also do not use an external XA transaction manager.
參數innodb_support_xa默認為true,表示啟用XA,雖然它會導致一次額外的磁盤flush(prepare階段flush redo log). 但是我們必須啟用,而不能關閉它。因為關閉會導致binlog寫入的順序和實際的事務提交順序不一致,會導致崩潰恢復和slave復制時發生數據錯誤。如果啟用了log-bin參數,並且不止一個線程對數據庫進行修改,那么就必須啟用innodb_support_xa參數。
參考:
1. http://www.csdn.net/article/2015-01-16/2823591 (淘寶丁奇:怎么跳出MySQL的10個大坑)
2. http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_support_xa
http://dev.mysql.com/doc/refman/5.6/en/xa.html
http://dev.mysql.com/doc/refman/5.6/en/xa-restrictions.html