MySQL--MHA與GTID


##==========================================##

MySQL 5.6版本引入GTID來解決主從切換時BINLOG位置點難定位的問題,MHA從0.56版本開始支持基於GTID的復制,在發生故障切換時判斷群集是否能采用基於GTID的方式進行切換

##==========================================##
基於GTID進行故障切換的條件:
1、所有節點開啟GTID模式,設置gtid_mode=1
2、所有節點上Executed_Gtid_Set不為空
3、至少一個節點使用Auto_Position=1

##==========================================##
基於GTID進行故障切換:
1、如果候選Master節點不擁有最新的Relay log,那么將候選Master連接到擁有最新Relay log的Salve上進行日志補償
2、如果群集中使用Binlog Server,則嘗試從Binlog Server上拉取缺失的Binlog並應用到候選Master上
3、候選Matser擁有最新數據,將其升級為新Master,將其他slave連接到新Master上進行數據同步,可以給masterha_master_switch傳入–wait_until_gtid_in_sync=1參數使其不等其它Slave完成數據同步,以加快切換速度。

##==========================================##
基於GTID模式進行故障切換時,無論原Master節點OS是否正常,都不會嘗試從原Master節點讀取BINLOG進行日志補償。
基於GTID模式的MHA支持在復制拓撲中使用BINLOG Server來進行日志補償,而非GTID模式的MHA會忽略BINLOG Server。
建議在基於GTID模式的群集中,不使用MHA進行"手動主從切換",該操作可能會導致原主庫上部分BINLOG丟失。

##==========================================##
在非GTID模式下,會先進行Phase 3.1階段,從擁有最新BINLOG的從庫上獲取差異日志,再進行Phase 3.2階段,嘗試從原Master服務器上獲取最新BINLOG。
 

使用非GTID模式切換的日志

View Code

##==========================================##
在基於GTID模式下,不會進行Phase 3.2階段,即嘗試從原Master服務器中獲取最新BINLOG。

使用GTID模式切換的日志:

Sun Jul  8 23:35:21 2018 - [info] MHA::MasterMonitor version 0.56.
Sun Jul  8 23:35:21 2018 - [info] GTID failover mode = 1
Sun Jul  8 23:35:21 2018 - [info] Dead Servers:
Sun Jul  8 23:35:21 2018 - [info] Alive Servers:
Sun Jul  8 23:35:21 2018 - [info]   10.0.203.104(10.0.203.104:3358)
Sun Jul  8 23:35:21 2018 - [info]   10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:35:21 2018 - [info]   10.0.203.117(10.0.203.117:3358)
Sun Jul  8 23:35:21 2018 - [info] Alive Slaves:
Sun Jul  8 23:35:21 2018 - [info]   10.0.203.104(10.0.203.104:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:35:21 2018 - [info]     GTID ON
Sun Jul  8 23:35:21 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:35:21 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul  8 23:35:21 2018 - [info]   10.0.203.117(10.0.203.117:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:35:21 2018 - [info]     GTID ON
Sun Jul  8 23:35:21 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:35:21 2018 - [info] Current Alive Master: 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:35:21 2018 - [info] Checking slave configurations..
Sun Jul  8 23:35:21 2018 - [info]  read_only=1 is not set on slave 10.0.203.104(10.0.203.104:3358).
Sun Jul  8 23:35:21 2018 - [info]  read_only=1 is not set on slave 10.0.203.117(10.0.203.117:3358).
Sun Jul  8 23:35:21 2018 - [info] Checking replication filtering settings..
Sun Jul  8 23:35:21 2018 - [info]  binlog_do_db= , binlog_ignore_db= 
Sun Jul  8 23:35:21 2018 - [info]  Replication filtering check ok.
Sun Jul  8 23:35:21 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sun Jul  8 23:35:21 2018 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jul  8 23:35:22 2018 - [info] HealthCheck: SSH to 10.0.203.109 is reachable.
Sun Jul  8 23:35:22 2018 - [info] 
10.0.203.109(10.0.203.109:3358) (current master)
 +--10.0.203.104(10.0.203.104:3358)
 +--10.0.203.117(10.0.203.117:3358)

Sun Jul  8 23:35:22 2018 - [warning] master_ip_failover_script is not defined.
Sun Jul  8 23:35:22 2018 - [warning] shutdown_script is not defined.
Sun Jul  8 23:35:22 2018 - [info] Set master ping interval 1 seconds.
Sun Jul  8 23:35:22 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sun Jul  8 23:35:22 2018 - [info] Starting ping health check on 10.0.203.109(10.0.203.109:3358)..
Sun Jul  8 23:35:22 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Sun Jul  8 23:35:58 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Sun Jul  8 23:35:58 2018 - [info] Executing SSH check script: exit 0
Sun Jul  8 23:35:58 2018 - [info] HealthCheck: SSH to 10.0.203.109 is reachable.
Sun Jul  8 23:35:59 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sun Jul  8 23:35:59 2018 - [warning] Connection failed 2 time(s)..
Sun Jul  8 23:36:00 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sun Jul  8 23:36:00 2018 - [warning] Connection failed 3 time(s)..
Sun Jul  8 23:36:01 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sun Jul  8 23:36:01 2018 - [warning] Connection failed 4 time(s)..
Sun Jul  8 23:36:01 2018 - [warning] Master is not reachable from health checker!
Sun Jul  8 23:36:01 2018 - [warning] Master 10.0.203.109(10.0.203.109:3358) is not reachable!
Sun Jul  8 23:36:01 2018 - [warning] SSH is reachable.
Sun Jul  8 23:36:01 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Sun Jul  8 23:36:01 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul  8 23:36:01 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jul  8 23:36:01 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jul  8 23:36:01 2018 - [info] GTID failover mode = 1
Sun Jul  8 23:36:01 2018 - [info] Dead Servers:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info] Alive Servers:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.104(10.0.203.104:3358)
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.117(10.0.203.117:3358)
Sun Jul  8 23:36:01 2018 - [info] Alive Slaves:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.104(10.0.203.104:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.117(10.0.203.117:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info] Checking slave configurations..
Sun Jul  8 23:36:01 2018 - [info]  read_only=1 is not set on slave 10.0.203.104(10.0.203.104:3358).
Sun Jul  8 23:36:01 2018 - [info]  read_only=1 is not set on slave 10.0.203.117(10.0.203.117:3358).
Sun Jul  8 23:36:01 2018 - [info] Checking replication filtering settings..
Sun Jul  8 23:36:01 2018 - [info]  Replication filtering check ok.
Sun Jul  8 23:36:01 2018 - [info] Master is down!
Sun Jul  8 23:36:01 2018 - [info] Terminating monitoring script.
Sun Jul  8 23:36:01 2018 - [info] Got exit code 20 (Master dead).
Sun Jul  8 23:36:01 2018 - [info] MHA::MasterFailover version 0.56.
Sun Jul  8 23:36:01 2018 - [info] Starting master failover.
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 1: Configuration Check Phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] GTID failover mode = 1
Sun Jul  8 23:36:01 2018 - [info] Dead Servers:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info] Checking master reachability via MySQL(double check)...
Sun Jul  8 23:36:01 2018 - [info]  ok.
Sun Jul  8 23:36:01 2018 - [info] Alive Servers:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.104(10.0.203.104:3358)
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.117(10.0.203.117:3358)
Sun Jul  8 23:36:01 2018 - [info] Alive Slaves:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.104(10.0.203.104:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.117(10.0.203.117:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info] Starting GTID based failover.
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Sun Jul  8 23:36:01 2018 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Sun Jul  8 23:36:01 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sun Jul  8 23:36:01 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 3: Master Recovery Phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000008:6689
Sun Jul  8 23:36:01 2018 - [info] Retrieved Gtid Set: 541e0f07-8047-11e8-8434-0800270b00d2:49-69
Sun Jul  8 23:36:01 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.104(10.0.203.104:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.117(10.0.203.117:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000008:6689
Sun Jul  8 23:36:01 2018 - [info] Retrieved Gtid Set: 541e0f07-8047-11e8-8434-0800270b00d2:49-69
Sun Jul  8 23:36:01 2018 - [info] Oldest slaves:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.104(10.0.203.104:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.117(10.0.203.117:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 3.3: Determining New Master Phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] Searching new master from slaves..
Sun Jul  8 23:36:01 2018 - [info]  Candidate masters from the configuration file:
Sun Jul  8 23:36:01 2018 - [info]   10.0.203.104(10.0.203.104:3358)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Sun Jul  8 23:36:01 2018 - [info]     GTID ON
Sun Jul  8 23:36:01 2018 - [info]     Replicating from 10.0.203.109(10.0.203.109:3358)
Sun Jul  8 23:36:01 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jul  8 23:36:01 2018 - [info]  Non-candidate masters:
Sun Jul  8 23:36:01 2018 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sun Jul  8 23:36:01 2018 - [info] New master is 10.0.203.104(10.0.203.104:3358)
Sun Jul  8 23:36:01 2018 - [info] Starting master failover..
Sun Jul  8 23:36:01 2018 - [info] 
From:
10.0.203.109(10.0.203.109:3358) (current master)
 +--10.0.203.104(10.0.203.104:3358)
 +--10.0.203.117(10.0.203.117:3358)

To:
10.0.203.104(10.0.203.104:3358) (new master)
 +--10.0.203.117(10.0.203.117:3358)
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 3.3: New Master Recovery Phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info]  Waiting all logs to be applied.. 
Sun Jul  8 23:36:01 2018 - [info]   done.
Sun Jul  8 23:36:01 2018 - [info] Getting new master's binlog name and position..
Sun Jul  8 23:36:01 2018 - [info]  mysql-bin.000006:77499
Sun Jul  8 23:36:01 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.203.104', MASTER_PORT=3358, MASTER_AUTO_POSITION=1, MASTER_USER='replicater', MASTER_PASSWORD='xxx';
Sun Jul  8 23:36:01 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000006, 77499, 41d8a420-8047-11e8-8580-080027e837eb:1-92,
541e0f07-8047-11e8-8434-0800270b00d2:1-69
Sun Jul  8 23:36:01 2018 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Sun Jul  8 23:36:01 2018 - [info] ** Finished master recovery successfully.
Sun Jul  8 23:36:01 2018 - [info] * Phase 3: Master Recovery Phase completed.
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 4: Slaves Recovery Phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] -- Slave recovery on host 10.0.203.117(10.0.203.117:3358) started, pid: 5680. Check tmp log /var/log/masterha/app1/10.0.203.117_3358_20180708233601.log if it takes time..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] Log messages from 10.0.203.117 ...
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info]  Resetting slave 10.0.203.117(10.0.203.117:3358) and starting replication from the new master 10.0.203.104(10.0.203.104:3358)..
Sun Jul  8 23:36:01 2018 - [info]  Executed CHANGE MASTER.
Sun Jul  8 23:36:01 2018 - [info]  Slave started.
Sun Jul  8 23:36:01 2018 - [info]  gtid_wait(41d8a420-8047-11e8-8580-080027e837eb:1-92,
541e0f07-8047-11e8-8434-0800270b00d2:1-69) completed on 10.0.203.117(10.0.203.117:3358). Executed 0 events.
Sun Jul  8 23:36:01 2018 - [info] End of log messages from 10.0.203.117.
Sun Jul  8 23:36:01 2018 - [info] -- Slave on host 10.0.203.117(10.0.203.117:3358) started.
Sun Jul  8 23:36:01 2018 - [info] All new slave servers recovered successfully.
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] * Phase 5: New master cleanup phase..
Sun Jul  8 23:36:01 2018 - [info] 
Sun Jul  8 23:36:01 2018 - [info] Resetting slave info on the new master..
Sun Jul  8 23:36:01 2018 - [info]  10.0.203.104: Resetting slave info succeeded.
Sun Jul  8 23:36:01 2018 - [info] Master failover to 10.0.203.104(10.0.203.104:3358) completed successfully.
Sun Jul  8 23:36:01 2018 - [info] 

----- Failover Report -----

app1: MySQL Master failover 10.0.203.109(10.0.203.109:3358) to 10.0.203.104(10.0.203.104:3358) succeeded

Master 10.0.203.109(10.0.203.109:3358) is down!

Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Selected 10.0.203.104(10.0.203.104:3358) as a new master.
10.0.203.104(10.0.203.104:3358): OK: Applying all logs succeeded.
10.0.203.117(10.0.203.117:3358): OK: Slave started, replicating from 10.0.203.104(10.0.203.104:3358)
10.0.203.104(10.0.203.104:3358): Resetting slave info succeeded.
Master failover to 10.0.203.104(10.0.203.104:3358) completed successfully.
View Code

##==========================================##
MHA在檢查時,使用SHOW SLAVE STATUS獲取結構中Auto_Position的值來判斷是否使用master_auto_position參數來搭建主從復制。

MHA在切換時,如果使用非GTID模式切換,則在CHANGE MASTER中不會帶上參數master_auto_position=0,而如果該從庫之前配置為master_auto_position=1,那么CHANGE MASTER會報錯,無法正常進行切換。

因此不能簡單修改Server.PM或DBHelper.pm文件來將基於GTID模式切換的群集修改為使用非GTID模式進行切換。

##==========================================##

如果群集因為某種原因導致主從節點上的Executed_Gtid_Set不同,如:

1、對從庫進行直接授權,導致從庫比主庫擁有更多BINLOG,但該Binlog因各種原因被Purged掉

2、群集做過版本升級,從未使用GTID的版本升級到GTID版本,從庫上曾一段時間內作為主庫提供服務,但該時間段日志被Purged掉

有上訴類似問題時,將從庫提升為主庫並使用master_auto_position=1來配置復制,復制會因為新主庫無法提供足夠BINLOG事件而失敗。

處理辦法:

1、通過RESET MASTER和SET GLOBAL gtid_purged=''使得所有節點擁有相同的GTID 集合

2、將所有復制修改為基於POS點搭建的復制。

##==========================================##


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM