要討論如何恢復從庫,我們得先來了解如下一些概念:
GTID_EXECUTED:它是一組包含已經記錄在二進制日志文件中的事務集合
GTID_PURGED:它是一組包含已經從二進制日志刪除掉的事務集合。
在繼續討論時,我們先來看下如何新建一個基於GTID的slave。
通過了解上面的兩個參數,我們現在只需要:
1.從主庫上做一個備份時記錄備份時gtid_executed的值。
2.在新的slave上恢復此備份時設置從庫的gtid_purged的值為備份時master上gtid_executed的值。
通過mysqldump可以完成我們需要的功能。
目前主庫上的狀態(3301):
[zejin] 3301>show global variables like 'gtid_executed'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 | +---------------+-------------------------------------------+ 1 row in set (0.00 sec) [zejin] 3301>show global variables like 'gtid_purged'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-13 | +---------------+-------------------------------------------+ 1 row in set (0.00 sec)
step1:用mysqldump做一個全備
mysqldump --all-databases --single-transaction --triggers --routines --events --host=127.0.0.1 --port=3301 --user=root --password=123 > dump3301.sql
打開dump3301.sql我們可以看到如下語句:
SET @@GLOBAL.GTID_PURGED='a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15';
此值即為master3301上gtid_executed的值。
step2:全新啟動一個新的庫3303,注意在配置文件中配置enforce_gtid_consistency及gtid_mode=on
mysqld_safe --defaults-file=/home/mysql/my3303.cnf & 此時新庫3303上的狀態應該是這樣的: [(none)] 3303>show global variables like 'gtid_executed'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | gtid_executed | | +---------------+-------+ 1 row in set (0.01 sec) [(none)] 3303>show global variables like 'gtid_purged'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | gtid_purged | | +---------------+-------+ 1 row in set (0.00 sec)
step3:導入備份文件並查看狀態值:
mysql -uroot -h127.0.0.1 -p123 -P3303 < dump3301.sql [(none)] 3303>show global variables like 'gtid_executed'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 | +---------------+-------------------------------------------+ 1 row in set (0.02 sec) [(none)] 3303>show global variables like 'gtid_purged'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 | +---------------+-------------------------------------------+ 1 row in set (0.00 sec)
step4:做主從change語句
[zejin] 3303>change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
[zejin] 3303>start slave;
Query OK, 0 rows affected (0.00 sec)
[zejin] 3303>show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.240
Master_User: repl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog57.000014
Read_Master_Log_Pos: 194
Relay_Log_File: zejin240-relay-bin.000002
Relay_Log_Pos: 365
Relay_Master_Log_File: binlog57.000014
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 194
Relay_Log_Space: 575
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 3301
Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
Master_Info_File: /home/mysql/I3303/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
至此完成了加入一台新的slave的GTID主從環境。
假如我們目前擁有一主帶兩從的環境:
master(3301)
slave(3302)
slave(3303)
我們來考慮這么一種異常情況,由於種種原因,有可能主庫上已經purge掉了一些binlog,但從庫都還沒有接收到(如slave停了一段時間,而master已經把一些binlog給purge掉了。)
主庫目前的狀態是:
[zejin] 3301>show global variables like 'gtid_executed'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-21 | +---------------+-------------------------------------------+ 1 row in set (0.00 sec) [zejin] 3301>show global variables like 'gtid_purged'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-20 | +---------------+-------------------------------------------+ 1 row in set (0.00 sec) [zejin] 3301>select * from t_users; +----+------+ | id | name | +----+------+ | 1 | chen | | 2 | ok | | 3 | li | +----+------+ 3 rows in set (0.00 sec)
在從庫3303上,我們可以看到如下錯誤提示:
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
[zejin] 3303>show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.1.240
Master_User: repl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog57.000014
Read_Master_Log_Pos: 457
Relay_Log_File: zejin240-relay-bin.000003
Relay_Log_Pos: 4
Relay_Master_Log_File: binlog57.000014
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 457
Relay_Log_Space: 194
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 3301
Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
Master_Info_File: /home/mysql/I3303/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 160809 17:25:39
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:16
Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-16
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
[zejin] 3303>select * from t_users;
+----+------+
| id | name |
+----+------+
| 1 | li |
| 2 | zhou |
+----+------+
2 rows in set (0.00 sec)
主從已經中斷,數據也已不一致。
接下來我們來看如何恢復:
由於GTID具有全局唯一性,那么其它正常的gtid已經被復制到了slave3302上,所以我們可以把3303指向3302,同步完畢后再指回master3301(此前提基於3302的binlog還沒被purge掉,即存在3303沒有從master3301接收到的GTID事務)
操作方法如下:
[zejin] 3303>change master to master_host='192.168.1.240',master_port=3302,master_user='repl',master_password='123',master_auto_position=1;
[zejin] 3303>start slave;
Query OK, 0 rows affected (0.03 sec)
[zejin] 3303>show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.240
Master_User: repl
Master_Port: 3302
Connect_Retry: 60
Master_Log_File: binlog57.000007
Read_Master_Log_Pos: 1723
Relay_Log_File: zejin240-relay-bin.000002
Relay_Log_Pos: 1687
Relay_Master_Log_File: binlog57.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1723
Relay_Log_Space: 1937
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 3302
Master_UUID: 5cee6f9f-5ab8-11e6-a081-000c29d4dc3f
Master_Info_File: /home/mysql/I3303/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:17-21
Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-21
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
[zejin] 3303>select * from t_users;
+----+------+
| id | name |
+----+------+
| 1 | chen |
| 2 | ok |
| 3 | li |
+----+------+
3 rows in set (0.00 sec)
數據也已經完全與主的一致了,復制正常后再change到3301master上。
[zejin] 3303>change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
[zejin] 3303>start slave;
Query OK, 0 rows affected (0.00 sec)
上面這種情況是基於還有另一個從庫已經接收到了master的所有binlog的情況下,那如果結果只是M-S,也發生了如上的問題,那又該如何恢復,我們有如下兩種方法:
目前Master上狀態為:
[zejin] 3301>show global variables like '%gtid%'; +----------------------------------+-------------------------------------------+ | Variable_name | Value | +----------------------------------+-------------------------------------------+ | gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-27 | …… | gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-25 | …… +----------------------------------+-------------------------------------------+ 8 rows in set (0.00 sec)
Slave上狀態為:
[zejin] 3303>show slave status \G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.1.240
Master_User: repl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog57.000016
Read_Master_Log_Pos: 729
Relay_Log_File: zejin240-relay-bin.000003
Relay_Log_Pos: 4
Relay_Master_Log_File: binlog57.000016
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 729
Relay_Log_Space: 194
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 3301
Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
Master_Info_File: /home/mysql/I3303/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 160809 17:54:42
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:22
Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-22
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
和之前同樣類型的錯誤,我們恢復的思路為:
把slave上的gtid_purged設置為master還沒有被purge掉的值,最后借助第三方一致性同步工具來做數據的一致性同步。
我們需要先在slave上做一下reset master來清除gtid的一些信息,直接設置會報如下錯誤:
[zejin] 3303>set global GTID_PURGED="a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-26"; ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
正確操作步驟如下(在slave上執行):
[zejin] 3303>reset master;
Query OK, 0 rows affected (0.02 sec)
[zejin] 3303>set global GTID_PURGED="a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-26";
Query OK, 0 rows affected (0.00 sec)
[zejin] 3303>start slave;
Query OK, 0 rows affected (0.00 sec)
[zejin] 3303>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.240
Master_User: repl
Master_Port: 3301
Connect_Retry: 60
Master_Log_File: binlog57.000018
Read_Master_Log_Pos: 728
Relay_Log_File: zejin240-relay-bin.000004
Relay_Log_Pos: 718
Relay_Master_Log_File: binlog57.000018
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 728
Relay_Log_Space: 968
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 3301
Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f
Master_Info_File: /home/mysql/I3303/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:22:27
Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-27
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
當然執行完這個之后數據是不一致的,那么此時就可以通過pt-table-checksum和pt-table-sync來做數據的一致性恢復了。
我們還有另一種方法,那就是重建slave,方法如本文最開始的那樣新建一個slave,但是在由於目前slave上已經有gtid的一些信息,所以在恢復時得先在slave上reset master,具體操作如下:
在slave上操作:
reset master source dump3301.sql; change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1; start slave; show slave status\G
至此完成slave同步異常的恢復。

