異步主從復制架構
master:
10.150.20.90 ed3jrdba90
slave:
10.15.20.97 ed3jrdba97
10.150.20.132 ed3jrdba132
manager:
10.150.20.95 ed3jrdba95
#新增VIP
vip:10.150.20.200
四台機器的系統情況:
OS:CentOS7.3
MySQL:5.7.21
MHA:0.58
網卡名:ens3
mha manager節點
1:配置app1.cnf文件
添加master_ip_failover_script的文件路徑,mysql master失敗時執行的切換腳本。
#vi /etc/mysql_mha/app1.cnf
#自動failover時候的切換腳本
master_ip_failover_script= /usr/local/bin/master_ip_failover
[root@dev05 ~]# cat /etc/mysql_mha/app1.cnf
[server default]
manager_log=/data/mysql_mha/app1-manager.log
manager_workdir=/data/mysql_mha/app1
master_binlog_dir=/data/mysql_33061/logs
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1
repl_password=replicator
repl_user=replicator
shutdown_script=""
ssh_user=root
user=mha_monitor
[server1]
hostname=10.150.20.90
port=33061
[server1]
hostname=10.150.20.97
port=33061
[server3]
hostname=10.150.20.132
port=33061
編輯master_ip_failover腳本文件:沒有使用 keepalived ,通過腳本的方式管理vip
# cp /usr/local/bin/master_ip_failover /usr/local/bin/master_ip_failover.bak
# vi /usr/local/bin/master_ip_failover

#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); #############################添加內容部分######################################### my $vip = '10.150.20.200'; my $brdc = '10.150.20.255'; my $ifdev = 'ens3'; my $key = '1'; my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;"; my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key"; ################################################################################## GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
更換ip后,一定要執行下arping
檢查復制環境ssh
# masterha_check_ssh --conf=/etc/mysql_mha/app1.cnf
Wed Dec 12 14:43:27 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Dec 12 14:43:27 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 14:43:27 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 14:43:27 2018 - [info] Starting SSH connection tests..
Wed Dec 12 14:43:28 2018 - [debug]
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.97(10.150.20.97:22)..
Wed Dec 12 14:43:27 2018 - [debug] ok.
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.132(10.150.20.132:22)..
Wed Dec 12 14:43:27 2018 - [debug] ok.
Wed Dec 12 14:43:28 2018 - [debug]
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.90(10.150.20.90:22)..
Wed Dec 12 14:43:27 2018 - [debug] ok.
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.132(10.150.20.132:22)..
Wed Dec 12 14:43:28 2018 - [debug] ok.
Wed Dec 12 14:43:29 2018 - [debug]
Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.90(10.150.20.90:22)..
Wed Dec 12 14:43:28 2018 - [debug] ok.
Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.97(10.150.20.97:22)..
Wed Dec 12 14:43:28 2018 - [debug] ok.
Wed Dec 12 14:43:29 2018 - [info] All SSH connection tests passed successfully.
檢查整個復制環境
# masterha_check_repl --conf=/etc/mysql_mha/app1.cnf
Wed Dec 12 14:44:35 2018 - [info] Slaves settings check done.
Wed Dec 12 14:44:35 2018 - [info]
10.150.20.90(10.150.20.90:33061) (current master)
+--10.150.20.97(10.150.20.97:33061)
+--10.150.20.132(10.150.20.132:33061)
Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.97..
Wed Dec 12 14:44:35 2018 - [info] ok.
Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.132..
Wed Dec 12 14:44:35 2018 - [info] ok.
Wed Dec 12 14:44:35 2018 - [info] Checking master_ip_failover_script status:
Wed Dec 12 14:44:35 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061
Wed Dec 12 14:44:35 2018 - [info] OK.
Wed Dec 12 14:44:35 2018 - [warning] shutdown_script is not defined.
Wed Dec 12 14:44:35 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
啟動 mha manager
# nohup masterha_manager --conf=/etc/mysql_mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mysql_mha/app1-manager.log 2>&1 &
查看 manager status
# masterha_check_status --conf=/etc/mysql_mha/app1.cnf
查看 manager log
# tail -n 1000 -f /var/log/masterha/app1-manager.log
驗證 failover
在主庫qa05.010150020090.yz節點,進行vip綁定:
[root@qa05 ~]#ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
[root@qa05 ~]#/usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200
#vip解綁:
# ip addr del 10.150.20.200/24 dev ens3 label ens3:1
模擬故障,在qa05.010150020090.yz上 kill 掉 mysqld 進程
[root@qa05 ~]## ps -ef|grep -i mysql
mysql 3114 1 0 Aug06 ? 00:00:51 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
root 15551 10466 0 Aug06 pts/1 00:00:00 mysql
root 25521 21213 0 03:52 pts/2 00:00:00 grep --color=auto -i mysql
[root@qa05 ~]## kill -9 27593 26101
觀察 mha manager 之前打開的日志輸出
[root@dev05 ~]# tail -n 1000 -f /data/mysql_mha/app1-manager.log

Wed Dec 12 14:54:10 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Wed Dec 12 14:54:10 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin Wed Dec 12 14:54:10 2018 - [info] HealthCheck: SSH to 10.150.20.90 is reachable. Wed Dec 12 14:54:15 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111)) Wed Dec 12 14:54:15 2018 - [warning] Connection failed 2 time(s).. Wed Dec 12 14:54:20 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111)) Wed Dec 12 14:54:20 2018 - [warning] Connection failed 3 time(s).. Wed Dec 12 14:54:25 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111)) Wed Dec 12 14:54:25 2018 - [warning] Connection failed 4 time(s).. Wed Dec 12 14:54:25 2018 - [warning] Master is not reachable from health checker! Wed Dec 12 14:54:25 2018 - [warning] Master 10.150.20.90(10.150.20.90:33061) is not reachable! Wed Dec 12 14:54:25 2018 - [warning] SSH is reachable. Wed Dec 12 14:54:25 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mysql_mha/app1.cnf again, and trying to connect to all servers to check server status.. Wed Dec 12 14:54:25 2018 - [info] Reading default configuration from /etc/masterha_default.cnf.. Wed Dec 12 14:54:25 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf.. Wed Dec 12 14:54:25 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf.. Wed Dec 12 14:54:26 2018 - [info] GTID failover mode = 0 Wed Dec 12 14:54:26 2018 - [info] Dead Servers: Wed Dec 12 14:54:26 2018 - [info] 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:26 2018 - [info] Alive Servers: Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061) Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061) Wed Dec 12 14:54:26 2018 - [info] Alive Slaves: Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:26 2018 - [info] Checking slave configurations.. Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061). Wed Dec 12 14:54:26 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061). Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061). Wed Dec 12 14:54:26 2018 - [info] Checking replication filtering settings.. Wed Dec 12 14:54:26 2018 - [info] Replication filtering check ok. Wed Dec 12 14:54:26 2018 - [info] Master is down! Wed Dec 12 14:54:26 2018 - [info] Terminating monitoring script. Wed Dec 12 14:54:26 2018 - [info] Got exit code 20 (Master dead). Wed Dec 12 14:54:26 2018 - [info] MHA::MasterFailover version 0.58. Wed Dec 12 14:54:26 2018 - [info] Starting master failover. Wed Dec 12 14:54:26 2018 - [info] Wed Dec 12 14:54:26 2018 - [info] * Phase 1: Configuration Check Phase.. Wed Dec 12 14:54:26 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] GTID failover mode = 0 Wed Dec 12 14:54:27 2018 - [info] Dead Servers: Wed Dec 12 14:54:27 2018 - [info] 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:27 2018 - [info] Checking master reachability via MySQL(double check)... Wed Dec 12 14:54:27 2018 - [info] ok. Wed Dec 12 14:54:27 2018 - [info] Alive Servers: Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Wed Dec 12 14:54:27 2018 - [info] Alive Slaves: Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:27 2018 - [info] Starting Non-GTID based failover. Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] ** Phase 1: Configuration Check Phase completed. Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] Forcing shutdown so that applications never connect to the current master.. Wed Dec 12 14:54:27 2018 - [info] Executing master IP deactivation script: Wed Dec 12 14:54:27 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --command=stopssh --ssh_user=root Wed Dec 12 14:54:27 2018 - [info] done. Wed Dec 12 14:54:27 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] * Phase 3: Master Recovery Phase.. Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000006:154 Wed Dec 12 14:54:27 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:27 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000006:154 Wed Dec 12 14:54:27 2018 - [info] Oldest slaves: Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061) Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Wed Dec 12 14:54:27 2018 - [info] Wed Dec 12 14:54:27 2018 - [info] Fetching dead master's binary logs.. Wed Dec 12 14:54:27 2018 - [info] Executing command on the dead master 10.150.20.90(10.150.20.90:33061): save_binary_logs --command=save --start_file=mysql-bin.000006 --start_pos=154 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 Creating /data/mysql_mha/app1 if not exists.. ok. Concat binary/relay logs from mysql-bin.000006 pos 154 to mysql-bin.000006 EOF into /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. No need to dump effective binlog data from /data/mysql_33061/logs/mysql-bin.000006 (pos starts 154, filesize 154). Skipping. Binlog Checksum enabled /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog has no effective data events. Event not exists. Wed Dec 12 14:54:28 2018 - [info] Additional events were not found from the orig master. No need to save. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] * Phase 3.3: Determining New Master Phase.. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Wed Dec 12 14:54:28 2018 - [info] All slaves received relay logs to the same position. No need to resync each other. Wed Dec 12 14:54:28 2018 - [info] Searching new master from slaves.. Wed Dec 12 14:54:28 2018 - [info] Candidate masters from the configuration file: Wed Dec 12 14:54:28 2018 - [info] Non-candidate masters: Wed Dec 12 14:54:28 2018 - [info] New master is 10.150.20.97(10.150.20.97:33061) Wed Dec 12 14:54:28 2018 - [info] Starting master failover.. Wed Dec 12 14:54:28 2018 - [info] From: 10.150.20.90(10.150.20.90:33061) (current master) +--10.150.20.97(10.150.20.97:33061) +--10.150.20.132(10.150.20.132:33061) To: 10.150.20.97(10.150.20.97:33061) (new master) +--10.150.20.132(10.150.20.132:33061) Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] * Phase 3.4: New Master Diff Log Generation Phase.. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] * Phase 3.5: Master Log Apply Phase.. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Wed Dec 12 14:54:28 2018 - [info] Starting recovery on 10.150.20.97(10.150.20.97:33061).. Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. Wed Dec 12 14:54:28 2018 - [info] done. Wed Dec 12 14:54:28 2018 - [info] All relay logs were successfully applied. Wed Dec 12 14:54:28 2018 - [info] Getting new master's binlog name and position.. Wed Dec 12 14:54:28 2018 - [info] mysql-bin.000010:2774 Wed Dec 12 14:54:28 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.150.20.97', MASTER_PORT=33061, MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=2774, MASTER_USER='replicator', MASTER_PASSWORD='xxx'; Wed Dec 12 14:54:28 2018 - [info] Executing master IP activate script: Wed Dec 12 14:54:28 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --new_master_host=10.150.20.97 --new_master_ip=10.150.20.97 --new_master_port=33061 --new_master_user='mha_monitor' --new_master_password=xxx Set read_only=0 on the new master. Creating app user on the new master.. Wed Dec 12 14:54:28 2018 - [info] OK. Wed Dec 12 14:54:28 2018 - [info] ** Finished master recovery successfully. Wed Dec 12 14:54:28 2018 - [info] * Phase 3: Master Recovery Phase completed. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] * Phase 4: Slaves Recovery Phase.. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Wed Dec 12 14:54:28 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] -- Slave diff file generation on host 10.150.20.132(10.150.20.132:33061) started, pid: 25117. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time.. Wed Dec 12 14:54:29 2018 - [info] Wed Dec 12 14:54:29 2018 - [info] Log messages from 10.150.20.132 ... Wed Dec 12 14:54:29 2018 - [info] Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed Dec 12 14:54:29 2018 - [info] End of log messages from 10.150.20.132. Wed Dec 12 14:54:29 2018 - [info] -- 10.150.20.132(10.150.20.132:33061) has the latest relay log events. Wed Dec 12 14:54:29 2018 - [info] Generating relay diff files from the latest slave succeeded. Wed Dec 12 14:54:29 2018 - [info] Wed Dec 12 14:54:29 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Wed Dec 12 14:54:29 2018 - [info] Wed Dec 12 14:54:29 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) started, pid: 25119. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time.. Wed Dec 12 14:54:30 2018 - [info] Wed Dec 12 14:54:30 2018 - [info] Log messages from 10.150.20.132 ... Wed Dec 12 14:54:30 2018 - [info] Wed Dec 12 14:54:29 2018 - [info] Starting recovery on 10.150.20.132(10.150.20.132:33061).. Wed Dec 12 14:54:29 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. Wed Dec 12 14:54:29 2018 - [info] done. Wed Dec 12 14:54:29 2018 - [info] All relay logs were successfully applied. Wed Dec 12 14:54:29 2018 - [info] Resetting slave 10.150.20.132(10.150.20.132:33061) and starting replication from the new master 10.150.20.97(10.150.20.97:33061).. Wed Dec 12 14:54:29 2018 - [info] Executed CHANGE MASTER. Wed Dec 12 14:54:29 2018 - [info] Slave started. Wed Dec 12 14:54:30 2018 - [info] End of log messages from 10.150.20.132. Wed Dec 12 14:54:30 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) succeeded. Wed Dec 12 14:54:30 2018 - [info] All new slave servers recovered successfully. Wed Dec 12 14:54:30 2018 - [info] Wed Dec 12 14:54:30 2018 - [info] * Phase 5: New master cleanup phase.. Wed Dec 12 14:54:30 2018 - [info] Wed Dec 12 14:54:30 2018 - [info] Resetting slave info on the new master.. Wed Dec 12 14:54:30 2018 - [info] 10.150.20.97: Resetting slave info succeeded. Wed Dec 12 14:54:30 2018 - [info] Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully. Wed Dec 12 14:54:30 2018 - [info] Deleted server1 entry from /etc/mysql_mha/app1.cnf . Wed Dec 12 14:54:30 2018 - [info] ----- Failover Report ----- app1: MySQL Master failover 10.150.20.90(10.150.20.90:33061) to 10.150.20.97(10.150.20.97:33061) succeeded Master 10.150.20.90(10.150.20.90:33061) is down! Check MHA Manager logs at dev05.010150020095.yz:/data/mysql_mha/app1-manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 10.150.20.90(10.150.20.90:33061) The latest slave 10.150.20.97(10.150.20.97:33061) has all relay logs for recovery. Selected 10.150.20.97(10.150.20.97:33061) as a new master. 10.150.20.97(10.150.20.97:33061): OK: Applying all logs succeeded. 10.150.20.97(10.150.20.97:33061): OK: Activated master IP address. 10.150.20.132(10.150.20.132:33061): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 10.150.20.132(10.150.20.132:33061): OK: Applying all logs succeeded. Slave started, replicating from 10.150.20.97(10.150.20.97:33061) 10.150.20.97(10.150.20.97:33061): Resetting slave info succeeded. Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.
從日志,可以看出new master切換至10.150.20.97,此時manager節點mha manager關閉
[root@dev05 ~]# masterha_check_status --conf=/etc/mysql_mha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
而新主qa06.010150020097.yz,vip綁定到ens3網卡上
[root@qa06 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 54:52:00:49:48:92 brd ff:ff:ff:ff:ff:ff
inet 10.150.20.97/24 brd 10.150.20.255 scope global ens3
valid_lft forever preferred_lft forever
inet 10.150.20.200/24 brd 10.150.20.255 scope global secondary ens3:1
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:7f:36:38:fe brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
此時的mha manager節點的配置文件app1.cnf被修改為:
[root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
[server default]
manager_log=/data/mysql_mha/app1-manager.log
manager_workdir=/data/mysql_mha/app1
master_binlog_dir=/data/mysql_33061/logs
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1
repl_password=replicator
repl_user=replicator
shutdown_script=""
ssh_user=root
user=mha_monitor
[server2]
hostname=10.150.20.97
port=33061
[server3]
hostname=10.150.20.132
port=33061
重新編輯app1.cnf
[root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
[server default]
manager_log=/data/mysql_mha/app1-manager.log
manager_workdir=/data/mysql_mha/app1
master_binlog_dir=/data/mysql_33061/logs
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1
repl_password=replicator
repl_user=replicator
shutdown_script=""
ssh_user=root
user=mha_monitor
[server1]
hostname=10.150.20.97
port=33061
[server2]
hostname=10.150.20.90
port=33061
[server3]
hostname=10.150.20.132
port=33061
重啟qa05.010150020090.yz的MySQL,搭建主從,指向新主
mysql> change master to
master_host='10.150.20.97',
master_user='replicator',
master_password='replicator',
master_port=33061,
master_log_file='mysql-bin.000010',
master_log_pos=2774;
mysql> start slave;
檢測復制環境
# masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

Wed Dec 12 15:06:56 2018 - [info] Reading default configuration from /etc/masterha_default.cnf.. Wed Dec 12 15:06:56 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf.. Wed Dec 12 15:06:56 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf.. Wed Dec 12 15:06:56 2018 - [info] MHA::MasterMonitor version 0.58. Wed Dec 12 15:06:57 2018 - [info] GTID failover mode = 0 Wed Dec 12 15:06:57 2018 - [info] Dead Servers: Wed Dec 12 15:06:57 2018 - [info] Alive Servers: Wed Dec 12 15:06:57 2018 - [info] 10.150.20.97(10.150.20.97:33061) Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061) Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061) Wed Dec 12 15:06:57 2018 - [info] Alive Slaves: Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061) Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061) Wed Dec 12 15:06:57 2018 - [info] Current Alive Master: 10.150.20.97(10.150.20.97:33061) Wed Dec 12 15:06:57 2018 - [info] Checking slave configurations.. Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.90(10.150.20.90:33061). Wed Dec 12 15:06:57 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.90(10.150.20.90:33061). Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061). Wed Dec 12 15:06:57 2018 - [info] Checking replication filtering settings.. Wed Dec 12 15:06:57 2018 - [info] binlog_do_db= , binlog_ignore_db= Wed Dec 12 15:06:57 2018 - [info] Replication filtering check ok. Wed Dec 12 15:06:57 2018 - [info] GTID (with auto-pos) is not supported Wed Dec 12 15:06:57 2018 - [info] Starting SSH connection tests.. Wed Dec 12 15:06:59 2018 - [info] All SSH connection tests passed successfully. Wed Dec 12 15:06:59 2018 - [info] Checking MHA Node version.. Wed Dec 12 15:07:00 2018 - [info] Version check ok. Wed Dec 12 15:07:00 2018 - [info] Checking SSH publickey authentication settings on the current master.. Wed Dec 12 15:07:00 2018 - [info] HealthCheck: SSH to 10.150.20.97 is reachable. Wed Dec 12 15:07:00 2018 - [info] Master MHA Node version is 0.58. Wed Dec 12 15:07:00 2018 - [info] Checking recovery script configurations on 10.150.20.97(10.150.20.97:33061).. Wed Dec 12 15:07:00 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000010 Wed Dec 12 15:07:00 2018 - [info] Connecting to root@10.150.20.97(10.150.20.97:22).. Creating /data/mysql_mha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysql_33061/logs, up to mysql-bin.000010 Wed Dec 12 15:07:01 2018 - [info] Binlog setting check done. Wed Dec 12 15:07:01 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.90 --slave_ip=10.150.20.90 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.90(10.150.20.90:22).. Checking slave recovery environment settings.. Opening /data/mysql_33061/logs/relay-log.info ... ok. Relay log found at /data/mysql_33061/logs, up to relaylog.000002 Temporary relay log file is /data/mysql_33061/logs/relaylog.000002 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges.. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.132(10.150.20.132:22).. Checking slave recovery environment settings.. Opening /data/mysql_33061/logs/relay-log.info ... ok. Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.000002 Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.000002 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges.. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Wed Dec 12 15:07:01 2018 - [info] Slaves settings check done. Wed Dec 12 15:07:01 2018 - [info] 10.150.20.97(10.150.20.97:33061) (current master) +--10.150.20.90(10.150.20.90:33061) +--10.150.20.132(10.150.20.132:33061) Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.90.. Wed Dec 12 15:07:01 2018 - [info] ok. Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.132.. Wed Dec 12 15:07:01 2018 - [info] ok. Wed Dec 12 15:07:01 2018 - [info] Checking master_ip_failover_script status: Wed Dec 12 15:07:01 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.97 --orig_master_ip=10.150.20.97 --orig_master_port=33061 Wed Dec 12 15:07:02 2018 - [info] OK. Wed Dec 12 15:07:02 2018 - [warning] shutdown_script is not defined. Wed Dec 12 15:07:02 2018 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
小結:
1:搭建MHA時,vip綁定需要自行綁定到主庫;當主庫發生failover,vip會綁定到新主
2:發生master_ip_failover之后,mha監控程序自動斷掉;
3:vip綁定:
# ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
# /usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200
vip解綁:
# ip addr del 10.150.20.200/24 dev ens3 label ens3:1
4:關閉mha監控程序為:
# masterha_stop --conf=/etc/mysql_mha/app1.cnf
Stopped app1 successfully.
5:failover的過程,基本為以下步驟:
1).配置文件檢查階段,這個階段會檢查整個集群配置文件
2).宕機的master處理,這個階段包括虛擬ip摘除操作,主機關機操作
3).復制dead master和最新slave相差的relay log,並保存到MHA Manger具體的目錄下
4).識別含有最新更新的slave
5).應用從master保存的二進制日志事件(binlog events)
6).提升一個slave為新的master進行復制
7).使其他的slave連接新的master進行復制