1. MySQL 數據恢復常用辦法
MySQL恢復的方法一般有三種:
1. 官方推薦的基於全備+binlog , 通常做法是先恢復最近一次的全備,然后通過mysqlbiinlog --start-position --stop-position binlog.000xxx | mysql -uroot -p xxx -S database 恢復到目標數據庫做恢復
2. 基於主從同步恢復數據,通常做法是先恢復最近一次的全備,然后恢復后的實例做slave 掛載到現有的master 上面,通過 start slave sql_thread until master_log_pos 恢復到故障前的一個pos。
現在嘗試第三種恢復方式, 通過原來主庫上面的binlog 把數據都恢復到slave 上。
處理思路:
因為relaylog和binlog本質實際上是一樣的,所以是否可以利用MySQL自身的sql_thread來增量binlog
1)重新初始化一個實例,恢復全量備份文件。
2)找到第一個binlog文件的position,和剩下所有的binlog。
3)將binlog偽裝成relaylog,通過sql thread增量恢復。
應用場景:
1. 最近的一次全備離故障位置比較遠,通過上面兩種方式的恢復時間太慢
2. 雙主keepalived的集群,由於keepalived沒有像MHA 那樣有日志補全機制,出故障是有可能會有數據丟失的,萬一同步有嚴重的復制延時出現故障切換到slave,這樣數據就不一致,需要做日志補全
2. 實驗步驟
1. 建立基於主從同步(這里實驗基於傳統的pos, 其實GTID 也一樣可行)
M1 :
root@localhost:mysql3307.sock [(none)]>select * from restore.t1; +----+------+ | id | c1 | +----+------+ | 1 | 1 | | 2 | 3 | | 3 | 2 | | 4 | 3 | | 5 | 6 | | 6 | 7 | | 7 | 9 | | 10 | NULL | | 11 | 10 | +----+------+ 9 rows in set (0.00 sec)
M2:(slave)
root@localhost:mysql3307.sock [(none)]>select * from restore.t1; +----+------+ | id | c1 | +----+------+ | 1 | 1 | | 2 | 3 | | 3 | 2 | | 4 | 3 | | 5 | 6 | | 6 | 7 | | 7 | 9 | | 10 | NULL | | 11 | 10 | +----+------+ 9 rows in set (0.00 sec)
root@localhost:mysql3307.sock [restore]>show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: m1
Master_User: repl
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: 3307-binlog.000002
Read_Master_Log_Pos: 154
Relay_Log_File: M2-relay-bin.000004
Relay_Log_Pos: 371
Relay_Master_Log_File: 3307-binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 154
Relay_Log_Space: 624
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 13307
Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3
Master_Info_File: /data/mysql/3307/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
記錄此時slave 的 relay-log 信息
[root@M2 data]# more M2-relay-bin.index ./M2-relay-bin.000003 ./M2-relay-bin.000004 [root@M2 data]# more relay-log.info 7 ./M2-relay-bin.000004 371 3307-binlog.000002 154 0 0 1
2. 使用sysbench 模擬數據不同步
[root@M1 logs]# mysqladmin create sbtest
[root@M1 sysbench]# sysbench --db-driver=mysql --mysql-host=m1 --mysql-port=3307 --mysql-user=sbtest --mysql-password='sbtest' /usr/share/sysbench/oltp_common.lua --tables=4 --table-size=100000 --threads=2 --time=60 --report-interval=10 prepare
在主庫導入數據的時候在slave端停止同步,制造數據不一致
root@localhost:mysql3307.sock [mysql]>stop slave
3. 等sysbench執行完,查看主庫的數據和slave 的數據
主庫:
root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest1; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest2; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec)
slave 端:
root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 67550 | +----------+ 1 row in set (0.06 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 70252 | +----------+ 1 row in set (0.04 sec)
可以看到主從不同步。
4. 此時查看slave 的status:
root@localhost:mysql3307.sock [(none)]>show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: m1
Master_User: repl
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: 3307-binlog.000002
Read_Master_Log_Pos: 76364214
Relay_Log_File: M2-relay-bin.000004
Relay_Log_Pos: 64490301
Relay_Master_Log_File: 3307-binlog.000002
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 64490084
Relay_Log_Space: 76364861
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3
Master_Info_File: /data/mysql/3307/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
由於本地的relay log 沒有執行完畢,為了保證實驗准確性,我們先讓本地的relaylog 執行完 , start slave sql_thread
再次檢查:
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: m1
Master_User: repl
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: 3307-binlog.000002
Read_Master_Log_Pos: 76364214
Relay_Log_File: M2-relay-bin.000005
Relay_Log_Pos: 4
Relay_Master_Log_File: 3307-binlog.000002
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 76364214
Relay_Log_Space: 154
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3
Master_Info_File: /data/mysql/3307/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
本地relaylog 已經全部執行完畢,此時記錄最新的relay log 信息:
[root@M2 data]# more relay-log.info
7
./M2-relay-bin.000005
4
3307-binlog.000002 76364214
0
0
1
0
0
1
上面這個信息很重要,說明了從庫執行到主庫的000002 的binlog的76364214 這個位置,我們下面將主庫的binlog 拷貝過來模擬relaylog, 並從這個位置開始恢復
5. 拷貝binlog 到目標端,並模擬成relay log
拷貝前先關閉從庫,並修改cnf (skip-slave-start)讓slave 不會重啟后自動開始復制
[root@M2 data]# ll
total 185248
-rw-r----- 1 root root 461 Oct 24 17:14 3307-binlog.000001 -rw-r----- 1 root root 76364609 Oct 24 17:14 3307-binlog.000002 -rw-r----- 1 root root 203 Oct 24 17:14 3307-binlog.000003 -rw-r----- 1 root root 419 Oct 24 17:14 3307-binlog.000004 -rw-r----- 1 root root 164 Oct 24 17:14 3307-binlog.index
-rw-r----- 1 mysql mysql 56 Oct 24 15:08 auto.cnf
-rw-r----- 1 mysql mysql 4720 Oct 24 17:14 ib_buffer_pool
-rw-r----- 1 mysql mysql 12582912 Oct 24 17:14 ibdata1
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:14 ib_logfile0
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:11 ib_logfile1
-rw-r----- 1 mysql mysql 177 Oct 24 17:14 M2-relay-bin.000005
-rw-r----- 1 mysql mysql 22 Oct 24 17:11 M2-relay-bin.index
-rw-r----- 1 mysql mysql 122 Oct 24 17:14 master.info
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 mysql
-rw------- 1 root root 0 Oct 24 15:08 nohup.out
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 performance_schema
-rw-r----- 1 mysql mysql 68 Oct 24 17:14 relay-log.info
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 restore
drwxr-x--- 2 mysql mysql 4096 Oct 24 16:47 sbtest
drwxr-x--- 2 mysql mysql 12288 Oct 24 15:07 sys
-rw-r----- 1 mysql mysql 24 Oct 24 15:07 xtrabackup_binlog_pos_innodb
-rw-r----- 1 mysql mysql 577 Oct 24 15:07 xtrabackup_info
改名為relay log
[root@M2 data]# cp 3307-binlog.000001 relay.000001 [root@M2 data]# cp 3307-binlog.000002 relay.000002 [root@M2 data]# cp 3307-binlog.000003 relay.000003 [root@M2 data]# cp 3307-binlog.000004 relay.000004
改權限屬性
[root@M2 data]# chown mysql.mysql -R *
修改relay log index 文件,讓系統能識別
[root@M2 data]# cat M2-relay-bin.index ./relay.000001 ./relay.000002 ./relay.000003 ./relay.000004
修改relay log info 文件,告訴系統從哪個位置開始復制
[root@M2 data]# cat relay-log.info 7 ./relay.000002 76364214 3307-binlog.000002 76364214 0 0 1 0 0 1
最后開起sql_thread 進程開始快速恢復
start slave sql_thread
6. 檢查數據是否一致
slave:
oot@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec)
可以看到slave 已經把缺失的數據都全部恢復了。
