MySQL MHA搭建

本文轉載自查看原文 2017-06-06 09:34 1474 MySQL/ 高可用

MHA算是業內比較成熟的MySQL高可用解決方案，在MySQL故障切換過程中，MHA能做到自動完成數據庫的故障切換操作，並且在進行故障切換的過程中，MHA能在最大程度上保證數據的一致性，以達到真正意義上的高可用。軟件主要有MHA Manager（管理節點）和MHA Node（數據節點）兩部分組成，在MHA自動故障切換過程中，MHA試圖從宕機的主服務器上保存二進制日志，最大程度的保證數據的不丟失，但這並不總是可行的。例如，如果主服務器硬件故障或無法通過ssh訪問，MHA沒法保存二進制日志，只進行故障轉移而丟失了最新的數據。使用MySQL 5.5的半同步復制，可以大大降低數據丟失的風險。MHA可以與半同步復制結合起來。如果只有一個slave已經收到了最新的二進制日志，MHA可以將最新的二進制日志應用於其他所有的slave服務器上，因此可以保證所有節點的數據一致性。

目前MHA主要支持一主多從的架構，要搭建MHA,要求一個復制集群中必須最少有三台數據庫服務器，一主二從，即一台充當master，一台充當備用master，另外一台充當從庫，因為至少需要三台服務器。

下面我們就開始着手配置我們的MHA高可用，因為本人只有兩台虛擬機，所以就只能按照兩台來搞了，中間也踩了點坑，下面看一下我們的基本環境：

MySQL1（master）：172.16.16.34:3306 +MHA Manager+MHA Node
MySQL2（slave1）：172.16.16.35:3306+MHA Node
MySQL3（slave2）：172.16.16.35:3307+MHA Node

我們假設一主兩從的環境我們已經搭建好了。

只有兩台機器，所以說湊合着用起來吧

1：首先我們要安裝MHA的安裝包，安裝MHA以前，要安裝一些依賴環境

NODE節點：

yum install -y perl-DBD-MySQL

Manager節點：

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

但是這些包我們系統是沒有的，需要我們安裝相應的epel第三方資源庫，再安裝，我們可以先去

https://fedoraproject.org/wiki/EPEL 這個網站下載我們需要的包，然后安裝：

[root@localhost yum.repos.d]# rpm -ivh epel-release-6-8.noarch.rpm

安裝完以后執行以下語句查看一下源：

[root@localhost yum.repos.d]# yum repolist
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
* epel: mirror.lzu.edu.cn
repo id repo name status
base CentOS-6 - Base - 163.com 6,706
*epel Extra Packages for Enterprise Linux 6 - x86_64 12,305
extras CentOS-6 - Extras - 163.com 45
updates CentOS-6 - Updates - 163.com 318
yum yum 6,367
repolist: 25,741

可以看到已經有epel相關的資源了，所以我們就可以執行執行上面的yum語句安裝MHA的依賴環境。

安裝完成以后在兩台機器安裝NODE節點在master機器安裝Manage：

[root@localhost sa]# rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm
[root@localhost sa]# rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm

我這邊包是已經下載好的，直接使用rpm安裝了。至此算是安裝完畢了

簡單介紹一下MHA的Manager工具包和Node工具包

Manager工具包主要包括以下幾個工具：

masterha_check_ssh 檢查MHA的SSH配置狀況
masterha_check_repl 檢查MySQL復制狀況
masterha_manger 啟動MHA
masterha_check_status 檢測當前MHA運行狀態
masterha_master_monitor 檢測master是否宕機
masterha_master_switch 控制故障轉移（自動或者手動）
masterha_conf_host 添加或刪除配置的server信息

Node工具包（這些工具通常由MHA Manager的腳本觸發，無需人為操作）主要包括以下幾個工具：

save_binary_logs 保存和復制master的二進制日志
apply_diff_relay_logs 識別差異的中繼日志事件並將其差異的事件應用於其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件（MHA已不再使用這個工具）
purge_relay_logs 清除中繼日志（不會阻塞SQL線程）

2：配置主機SSH免密登錄

由於我這兩台測試機是從運維手里申請的，折騰過來配置SSH浪費了不少時間，而且我這邊還是兩台server代替MHA的一主兩從一管理的四台機器，中間還是有點問題的

兩台機器生成自己的公鑰信息：ssh-keygen -t rsa

以一台機器為例，34拷貝自己的公鑰到其他機器：

scp ~/.ssh/id_rsa.pub root@172.16.16.35:/root/.ssh/authorized_keys

然后執行授權語句：

chmod 600 /root/.ssh/authorized_keys

按說是OK了，我們驗證一下：

[root@localhost .ssh]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Sat May 27 10:11:15 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 27 10:11:15 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat May 27 10:11:15 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat May 27 10:11:15 2017 - [info] Starting SSH connection tests..
Sat May 27 10:11:16 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Sat May 27 10:11:15 2017 - [debug] Connecting via SSH from root@172.16.16.34(172.16.16.34:22) to root@172.16.16.35(172.16.16.35:22)..
ssh: connect to host 172.16.16.34 port 22: Connection refused
Sat May 27 10:11:15 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.16.16.34(172.16.16.34:22) to root@172.16.16.35(172.16.16.35:22) failed!
Sat May 27 10:11:16 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Sat May 27 10:11:16 2017 - [debug] Connecting via SSH from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22)..
ssh: connect to host 172.16.16.35 port 22: Connection refused
Sat May 27 10:11:16 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22) failed!
Sat May 27 10:11:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Sat May 27 10:11:16 2017 - [debug] Connecting via SSH from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22)..
ssh: connect to host 172.16.16.35 port 22: Connection refused
Sat May 27 10:11:16 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22) failed!
SSH Configuration Check Failed!
at /usr/bin/masterha_check_ssh line 44
[root@localhost .ssh]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Sat May 27 10:11:40 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 27 10:11:40 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat May 27 10:11:40 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat May 27 10:11:40 2017 - [info] Starting SSH connection tests..
Sat May 27 10:11:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Sat May 27 10:11:40 2017 - [debug] Connecting via SSH from root@172.16.16.34(172.16.16.34:22) to root@172.16.16.35(172.16.16.35:22)..
ssh: connect to host 172.16.16.34 port 22: Connection refused
Sat May 27 10:11:40 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.16.16.34(172.16.16.34:22) to root@172.16.16.35(172.16.16.35:22) failed!
Sat May 27 10:11:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Sat May 27 10:11:41 2017 - [debug] Connecting via SSH from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22)..
ssh: connect to host 172.16.16.35 port 22: Connection refused
Sat May 27 10:11:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22) failed!
Sat May 27 10:11:42 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Sat May 27 10:11:41 2017 - [debug] Connecting via SSH from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22)..
ssh: connect to host 172.16.16.35 port 22: Connection refused
Sat May 27 10:11:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.16.16.35(172.16.16.35:22) to root@172.16.16.34(172.16.16.34:22) failed!
SSH Configuration Check Failed!

發現是失敗的，我們這里需要把自己的公鑰信息加入到認證（兩台機器都要執行）：

[root@localhost .ssh]# cat id_rsa.pub >>authorized_keys

再次執行就OK了

[root@localhost .ssh]# masterha_check_ssh --conf=/etc/mha/app1.cnf

這里使用到了MHA的配置文件，我們貼一下：

[root@localhost .ssh]# cat /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1.log
master_binlog_dir=/home/mysql/db3306/log/
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=123456
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=root
report_script=/usr/local/bin/send_report
shutdown_script=""
ssh_user=root
user=root
 
[server1]
hostname=172.16.16.34
port=3306
 
[server2]
hostname=172.16.16.35
port=3306
candidate_master=1
check_repl_delay=0
 
[server3]
hostname=172.16.16.35
port=3307

我這里創建了一個root@%的最高權限給MHA來使用。由於我們假設一主兩從是已經搭建好的，具體授權什么的也不在贅述了。相信大家配置MHA的話對於這些小問題都是小兒科了。

3：我們也可以檢測一下復制的問題。

不過在此之前要設置我們的從庫read_only=1；

mysql -h172.16.16.35 -P3306 -uroot -p123456 -e'set global read_only=1'
mysql -h172.16.16.35 -P3307 -uroot -p123456 -e'set global read_only=1'

然后執行檢查：

[root@localhost .ssh]# masterha_check_repl --conf=/etc/mha/app1.cnf
Sat May 27 15:01:57 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 27 15:01:57 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat May 27 15:01:57 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat May 27 15:01:57 2017 - [info] MHA::MasterMonitor version 0.57.
Sat May 27 15:01:57 2017 - [info] GTID failover mode = 1
Sat May 27 15:01:57 2017 - [info] Dead Servers:
Sat May 27 15:01:57 2017 - [info] Alive Servers:
Sat May 27 15:01:57 2017 - [info] 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:01:57 2017 - [info] 172.16.16.35(172.16.16.35:3306)
Sat May 27 15:01:57 2017 - [info] 172.16.16.35(172.16.16.35:3307)
Sat May 27 15:01:57 2017 - [info] Alive Slaves:
Sat May 27 15:01:57 2017 - [info] 172.16.16.35(172.16.16.35:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Sat May 27 15:01:57 2017 - [info] GTID ON
Sat May 27 15:01:57 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:01:57 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 27 15:01:57 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Sat May 27 15:01:57 2017 - [info] GTID ON
Sat May 27 15:01:57 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:01:57 2017 - [info] Current Alive Master: 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:01:57 2017 - [info] Checking slave configurations..
Sat May 27 15:01:57 2017 - [info] Checking replication filtering settings..
Sat May 27 15:01:57 2017 - [info] binlog_do_db= , binlog_ignore_db=
Sat May 27 15:01:57 2017 - [info] Replication filtering check ok.
Sat May 27 15:01:57 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat May 27 15:01:57 2017 - [info] Checking SSH publickey authentication settings on the current master..
Sat May 27 15:01:57 2017 - [info] HealthCheck: SSH to 172.16.16.34 is reachable.
Sat May 27 15:01:57 2017 - [info]
172.16.16.34(172.16.16.34:3306) (current master)
+--172.16.16.35(172.16.16.35:3306)
+--172.16.16.35(172.16.16.35:3307)
 
Sat May 27 15:01:57 2017 - [info] Checking replication health on 172.16.16.35..
Sat May 27 15:01:57 2017 - [info] ok.
Sat May 27 15:01:57 2017 - [info] Checking replication health on 172.16.16.35..
Sat May 27 15:01:57 2017 - [info] ok.
Sat May 27 15:01:57 2017 - [warning] master_ip_failover_script is not defined.
Sat May 27 15:01:57 2017 - [warning] shutdown_script is not defined.
Sat May 27 15:01:57 2017 - [info] Got exit code 0 (Not master dead).
 
MySQL Replication Health is OK.

我們看到復制是OK 的，這里我們注釋掉了#master_ip_failover_script，根據我看大師兄的博客里面所說MHA的Failover有兩種方式：一種是虛擬IP地址，一種是全局配置文件。MHA並沒有限定使用哪一種方式，而是讓用戶自己選擇，虛擬IP地址的方式會牽扯到其它的軟件,比如keepalive軟件，而且還要修改腳本master_ip_failover。所以說我們這里先注釋掉這塊。

雖然已經成功了，但是有兩個warning，因為這兩個腳本我們還沒有定義，后面補上，先不管他

4：提起MHA

[root@localhost .ssh]#nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[1] 8195

檢查一下MHA的運行狀態：

[root@localhost .ssh]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:8469) is running(0:PING_OK), master:172.16.16.34

發現是運行狀態，證明啟動是成功的，我們去看一下日志：

[root@localhost masterha]# cat /var/log/mha/app1/manager.log
Sat May 27 15:50:47 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 27 15:50:47 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat May 27 15:50:47 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat May 27 15:50:47 2017 - [info] MHA::MasterMonitor version 0.57.
Sat May 27 15:50:47 2017 - [warning] /var/log/mha/app1.log/app1.master_status.health already exists. You might have killed manager with SIGKILL(-9), may run two or more monitoring process for the same application, or use the same working directory. Check for details, and consider setting --workdir separately.
Sat May 27 15:50:48 2017 - [info] GTID failover mode = 1
Sat May 27 15:50:48 2017 - [info] Dead Servers:
Sat May 27 15:50:48 2017 - [info] Alive Servers:
Sat May 27 15:50:48 2017 - [info] 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:50:48 2017 - [info] 172.16.16.35(172.16.16.35:3306)
Sat May 27 15:50:48 2017 - [info] 172.16.16.35(172.16.16.35:3307)
Sat May 27 15:50:48 2017 - [info] Alive Slaves:
Sat May 27 15:50:48 2017 - [info] 172.16.16.35(172.16.16.35:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Sat May 27 15:50:48 2017 - [info] GTID ON
Sat May 27 15:50:48 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:50:48 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 27 15:50:48 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Sat May 27 15:50:48 2017 - [info] GTID ON
Sat May 27 15:50:48 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:50:48 2017 - [info] Current Alive Master: 172.16.16.34(172.16.16.34:3306)
Sat May 27 15:50:48 2017 - [info] Checking slave configurations..
Sat May 27 15:50:48 2017 - [info] Checking replication filtering settings..
Sat May 27 15:50:48 2017 - [info] binlog_do_db= , binlog_ignore_db=
Sat May 27 15:50:48 2017 - [info] Replication filtering check ok.
Sat May 27 15:50:48 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat May 27 15:50:48 2017 - [info] Checking SSH publickey authentication settings on the current master..
Sat May 27 15:50:48 2017 - [info] HealthCheck: SSH to 172.16.16.34 is reachable.
Sat May 27 15:50:48 2017 - [info]
172.16.16.34(172.16.16.34:3306) (current master)
+--172.16.16.35(172.16.16.35:3306)
+--172.16.16.35(172.16.16.35:3307)
 
Sat May 27 15:50:48 2017 - [warning] master_ip_failover_script is not defined.
Sat May 27 15:50:48 2017 - [warning] shutdown_script is not defined.
Sat May 27 15:50:48 2017 - [info] Set master ping interval 1 seconds.
Sat May 27 15:50:48 2017 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s server03 -s server02
Sat May 27 15:50:48 2017 - [info] Starting ping health check on 172.16.16.34(172.16.16.34:3306)..
Sat May 27 15:50:48 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

沒有問題。

如果我們向關閉的話也非常簡單

[root@localhost .ssh]# masterha_stop --conf=/etc/mha/app1.cnf

5:管理VIP：

我們上面已經說過了，有兩種VIP的管理方式，一種是keepalived，一種是腳本的方式管理VIP，keepalived的管理方式比較簡單就是主節點和備用節點兩台機器，監控MySQL進程就好了，這個和keepalived+MySQL雙主並沒有太大區別在配置方面，關於這個配置可以看下我的上篇博客，博客地址： keepalived+MySQL雙主搭建

下面我們主要使用腳本的方式管理VIP，定義master_ip_failover，我們這里直接使用大師兄的博客里面的腳本：

#!/usr/bin/env perl
 
use strict;
use warnings FATAL => 'all';
 
use Getopt::Long;
 
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
 
my $vip = '172.16.16.20/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
 
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
 
exit &main();
 
sub main {
 
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
 
if ( $command eq "stop" || $command eq "stopssh" ) {
 
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
 
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
 
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

然后我們手動在server1上添加虛擬IP

/sbin/ifconfig eth0:1 172.16.16.20/24

重新提起來MHA manager：

[root@localhost masterha]# masterha_stop --conf=/etc/masterha/app1.cnf
[root@localhost masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[root@localhost masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:3953) is running(0:PING_OK), master:172.16.16.34

現在來說，我們的MHA已經完全搭建起來了，下面測試一下故障轉移看看有沒有問題：

（1）下面開始測試手動的故障轉移：

手動故障轉移：

masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=172.16.16.34 --dead_master_port=3306 --new_master_host=172.16.16.35 --new_master_port=3306 --ignore_last_failover

執行完成后就能夠看到主庫的地址已經是172.16.16.35:3306,VIP：172.16.16.20也已經轉移到了172.16.16.35上了。

（2）測試一下自動故障轉移

現在我們重新做主從

VIP：172.16.16.20在server2:172.16.16.35上,MySQL master是：172.16.16.35:3306

從庫：172.16.16.34:3306和172.16.16.35:3307

手動kill 掉主庫

發現已經自動切換了，接下來看一下日志：

Mon Jun 5 14:23:13 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 5 14:23:13 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:23:13 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:23:13 2017 - [info] MHA::MasterMonitor version 0.57.
Mon Jun 5 14:23:14 2017 - [info] GTID failover mode = 1
Mon Jun 5 14:23:14 2017 - [info] Dead Servers:
Mon Jun 5 14:23:14 2017 - [info] Alive Servers:
Mon Jun 5 14:23:14 2017 - [info] 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:14 2017 - [info] 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:23:14 2017 - [info] 172.16.16.35(172.16.16.35:3307)
Mon Jun 5 14:23:14 2017 - [info] Alive Slaves:
Mon Jun 5 14:23:14 2017 - [info] 172.16.16.34(172.16.16.34:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:14 2017 - [info] GTID ON
Mon Jun 5 14:23:14 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:14 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:23:14 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:14 2017 - [info] GTID ON
Mon Jun 5 14:23:14 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:14 2017 - [info] Current Alive Master: 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:14 2017 - [info] Checking slave configurations..
Mon Jun 5 14:23:14 2017 - [info] Checking replication filtering settings..
Mon Jun 5 14:23:14 2017 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jun 5 14:23:14 2017 - [info] Replication filtering check ok.
Mon Jun 5 14:23:14 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jun 5 14:23:14 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jun 5 14:23:14 2017 - [info] HealthCheck: SSH to 172.16.16.35 is reachable.
Mon Jun 5 14:23:14 2017 - [info]
172.16.16.35(172.16.16.35:3306) (current master)
+--172.16.16.34(172.16.16.34:3306)
+--172.16.16.35(172.16.16.35:3307)
 
Mon Jun 5 14:23:14 2017 - [info] Checking master_ip_failover_script status:
Mon Jun 5 14:23:14 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.16.35 --orig_master_ip=172.16.16.35 --orig_master_port=3306
 
 
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.16.20/24===
 
Checking the Status of the script.. OK
Mon Jun 5 14:23:14 2017 - [info] OK.
Mon Jun 5 14:23:14 2017 - [warning] shutdown_script is not defined.
Mon Jun 5 14:23:14 2017 - [info] Set master ping interval 1 seconds.
Mon Jun 5 14:23:14 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Jun 5 14:23:14 2017 - [info] Starting ping health check on 172.16.16.35(172.16.16.35:3306)..
Mon Jun 5 14:23:14 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Mon Jun 5 14:23:46 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon Jun 5 14:23:46 2017 - [info] Executing SSH check script: exit 0
Mon Jun 5 14:23:46 2017 - [info] HealthCheck: SSH to 172.16.16.35 is reachable.
Mon Jun 5 14:23:47 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Mon Jun 5 14:23:47 2017 - [warning] Connection failed 2 time(s)..
Mon Jun 5 14:23:48 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Mon Jun 5 14:23:48 2017 - [warning] Connection failed 3 time(s)..
Mon Jun 5 14:23:49 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Mon Jun 5 14:23:49 2017 - [warning] Connection failed 4 time(s)..
Mon Jun 5 14:23:49 2017 - [warning] Master is not reachable from health checker!
Mon Jun 5 14:23:49 2017 - [warning] Master 172.16.16.35(172.16.16.35:3306) is not reachable!
Mon Jun 5 14:23:49 2017 - [warning] SSH is reachable.
Mon Jun 5 14:23:49 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Mon Jun 5 14:23:49 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 5 14:23:49 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:23:49 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:23:49 2017 - [info] GTID failover mode = 1
Mon Jun 5 14:23:49 2017 - [info] Dead Servers:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Alive Servers:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3307)
Mon Jun 5 14:23:49 2017 - [info] Alive Slaves:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.34(172.16.16.34:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Checking slave configurations..
Mon Jun 5 14:23:49 2017 - [info] Checking replication filtering settings..
Mon Jun 5 14:23:49 2017 - [info] Replication filtering check ok.
Mon Jun 5 14:23:49 2017 - [info] Master is down!
Mon Jun 5 14:23:49 2017 - [info] Terminating monitoring script.
Mon Jun 5 14:23:49 2017 - [info] Got exit code 20 (Master dead).
Mon Jun 5 14:23:49 2017 - [info] MHA::MasterFailover version 0.57.
Mon Jun 5 14:23:49 2017 - [info] Starting master failover.
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] * Phase 1: Configuration Check Phase..
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] GTID failover mode = 1
Mon Jun 5 14:23:49 2017 - [info] Dead Servers:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Checking master reachability via MySQL(double check)...
Mon Jun 5 14:23:49 2017 - [info] ok.
Mon Jun 5 14:23:49 2017 - [info] Alive Servers:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3307)
Mon Jun 5 14:23:49 2017 - [info] Alive Slaves:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.34(172.16.16.34:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Starting GTID based failover.
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Jun 5 14:23:49 2017 - [info] Executing master IP deactivation script:
Mon Jun 5 14:23:49 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=172.16.16.35 --orig_master_ip=172.16.16.35 --orig_master_port=3306 --command=stopssh --ssh_user=root
 
 
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.16.20/24===
 
Disabling the VIP on old master: 172.16.16.35
Mon Jun 5 14:23:49 2017 - [info] done.
Mon Jun 5 14:23:49 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Jun 5 14:23:49 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] * Phase 3: Master Recovery Phase..
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:194
Mon Jun 5 14:23:49 2017 - [info] Retrieved Gtid Set: 806ede0c-357e-11e7-9719-00505693235d:1
Mon Jun 5 14:23:49 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.34(172.16.16.34:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:194
Mon Jun 5 14:23:49 2017 - [info] Retrieved Gtid Set: 806ede0c-357e-11e7-9719-00505693235d:1
Mon Jun 5 14:23:49 2017 - [info] Oldest slaves:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.34(172.16.16.34:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] * Phase 3.3: Determining New Master Phase..
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] Searching new master from slaves..
Mon Jun 5 14:23:49 2017 - [info] Candidate masters from the configuration file:
Mon Jun 5 14:23:49 2017 - [info] 172.16.16.34(172.16.16.34:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:23:49 2017 - [info] GTID ON
Mon Jun 5 14:23:49 2017 - [info] Replicating from 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:23:49 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:23:49 2017 - [info] Non-candidate masters:
Mon Jun 5 14:23:49 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Mon Jun 5 14:23:49 2017 - [info] New master is 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:23:49 2017 - [info] Starting master failover..
Mon Jun 5 14:23:49 2017 - [info]
From:
172.16.16.35(172.16.16.35:3306) (current master)
+--172.16.16.34(172.16.16.34:3306)
+--172.16.16.35(172.16.16.35:3307)
 
To:
172.16.16.34(172.16.16.34:3306) (new master)
+--172.16.16.35(172.16.16.35:3307)
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] * Phase 3.3: New Master Recovery Phase..
Mon Jun 5 14:23:49 2017 - [info]
Mon Jun 5 14:23:49 2017 - [info] Waiting all logs to be applied..
Mon Jun 5 14:23:49 2017 - [info] done.
Mon Jun 5 14:23:49 2017 - [info] Getting new master's binlog name and position..
Mon Jun 5 14:23:49 2017 - [info] mysql-bin.000001:427
Mon Jun 5 14:23:49 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.16.34', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='root', MASTER_PASSWORD='xxx';
Mon Jun 5 14:23:49 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 427, 806ede0c-357e-11e7-9719-00505693235d:1
Mon Jun 5 14:23:49 2017 - [info] Executing master IP activate script:
Mon Jun 5 14:23:49 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.16.16.35 --orig_master_ip=172.16.16.35 --orig_master_port=3306 --new_master_host=172.16.16.34 --new_master_ip=172.16.16.34 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password
 
 
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.16.20/24===
 
Enabling the VIP - 172.16.16.20/24 on the new master - 172.16.16.34
Mon Jun 5 14:23:50 2017 - [info] OK.
Mon Jun 5 14:23:50 2017 - [info] Setting read_only=0 on 172.16.16.34(172.16.16.34:3306)..
Mon Jun 5 14:23:50 2017 - [info] ok.
Mon Jun 5 14:23:50 2017 - [info] ** Finished master recovery successfully.
Mon Jun 5 14:23:50 2017 - [info] * Phase 3: Master Recovery Phase completed.
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info] * Phase 4: Slaves Recovery Phase..
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info] -- Slave recovery on host 172.16.16.35(172.16.16.35:3307) started, pid: 636. Check tmp log /var/log/mha/app1.log/172.16.16.35_3307_20170605142349.log if it takes time..
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info] Log messages from 172.16.16.35 ...
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info] Resetting slave 172.16.16.35(172.16.16.35:3307) and starting replication from the new master 172.16.16.34(172.16.16.34:3306)..
Mon Jun 5 14:23:50 2017 - [info] Executed CHANGE MASTER.
Mon Jun 5 14:23:50 2017 - [info] Slave started.
Mon Jun 5 14:23:50 2017 - [info] gtid_wait(806ede0c-357e-11e7-9719-00505693235d:1) completed on 172.16.16.35(172.16.16.35:3307). Executed 0 events.
Mon Jun 5 14:23:50 2017 - [info] End of log messages from 172.16.16.35.
Mon Jun 5 14:23:50 2017 - [info] -- Slave on host 172.16.16.35(172.16.16.35:3307) started.
Mon Jun 5 14:23:50 2017 - [info] All new slave servers recovered successfully.
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info] * Phase 5: New master cleanup phase..
Mon Jun 5 14:23:50 2017 - [info]
Mon Jun 5 14:23:50 2017 - [info] Resetting slave info on the new master..
Mon Jun 5 14:23:50 2017 - [info] 172.16.16.34: Resetting slave info succeeded.
Mon Jun 5 14:23:50 2017 - [info] Master failover to 172.16.16.34(172.16.16.34:3306) completed successfully.
Mon Jun 5 14:23:50 2017 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
Mon Jun 5 14:23:50 2017 - [info]
 
----- Failover Report -----
 
app1: MySQL Master failover 172.16.16.35(172.16.16.35:3306) to 172.16.16.34(172.16.16.34:3306) succeeded
 
Master 172.16.16.35(172.16.16.35:3306) is down!
 
Check MHA Manager logs at localhost.localdomain:/var/log/mha/app1/manager.log for details.
 
Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.16.35(172.16.16.35:3306)
Selected 172.16.16.34(172.16.16.34:3306) as a new master.
172.16.16.34(172.16.16.34:3306): OK: Applying all logs succeeded.
172.16.16.34(172.16.16.34:3306): OK: Activated master IP address.
172.16.16.35(172.16.16.35:3307): OK: Slave started, replicating from 172.16.16.34(172.16.16.34:3306)
172.16.16.34(172.16.16.34:3306): Resetting slave info succeeded.
Master failover to 172.16.16.34(172.16.16.34:3306) completed successfully.
Mon Jun 5 14:23:50 2017 - [info] Sending mail..
sh: /usr/local/bin/send_report: No such file or directory
Mon Jun 5 14:23:50 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2066] Failed to send mail with return code 127:0
tail: /var/log/mha/app1/manager.log: file truncated
Mon Jun 5 14:48:26 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 5 14:48:26 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:48:26 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:48:26 2017 - [info] MHA::MasterMonitor version 0.57.
Mon Jun 5 14:48:26 2017 - [info] GTID failover mode = 1
Mon Jun 5 14:48:26 2017 - [info] Dead Servers:
Mon Jun 5 14:48:26 2017 - [info] Alive Servers:
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3307)
Mon Jun 5 14:48:26 2017 - [info] Alive Slaves:
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:48:26 2017 - [info] GTID ON
Mon Jun 5 14:48:26 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:48:26 2017 - [info] GTID ON
Mon Jun 5 14:48:26 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] Current Alive Master: 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] Checking slave configurations..
Mon Jun 5 14:48:26 2017 - [info] Checking replication filtering settings..
Mon Jun 5 14:48:26 2017 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jun 5 14:48:26 2017 - [info] Replication filtering check ok.
Mon Jun 5 14:48:26 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jun 5 14:48:26 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jun 5 14:48:26 2017 - [info] HealthCheck: SSH to 172.16.16.34 is reachable.
Mon Jun 5 14:48:26 2017 - [info]
172.16.16.34(172.16.16.34:3306) (current master)
+--172.16.16.35(172.16.16.35:3306)
+--172.16.16.35(172.16.16.35:3307)
 
Mon Jun 5 14:48:26 2017 - [info] Checking master_ip_failover_script status:
Mon Jun 5 14:48:26 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.16.34 --orig_master_ip=172.16.16.34 --orig_master_port=3306
 
 
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.16.20/24===
 
Checking the Status of the script.. OK
Mon Jun 5 14:48:26 2017 - [info] OK.
Mon Jun 5 14:48:26 2017 - [warning] shutdown_script is not defined.
Mon Jun 5 14:48:26 2017 - [info] Set master ping interval 1 seconds.
Mon Jun 5 14:48:26 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Jun 5 14:48:26 2017 - [info] Starting ping health check on 172.16.16.34(172.16.16.34:3306)..
Mon Jun 5 14:48:26 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
^C
[root@localhost ~]# tail -100 /var/log/mha/app1/manager.log
Mon Jun 5 14:48:26 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 5 14:48:26 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:48:26 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jun 5 14:48:26 2017 - [info] MHA::MasterMonitor version 0.57.
Mon Jun 5 14:48:26 2017 - [info] GTID failover mode = 1
Mon Jun 5 14:48:26 2017 - [info] Dead Servers:
Mon Jun 5 14:48:26 2017 - [info] Alive Servers:
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3306)
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3307)
Mon Jun 5 14:48:26 2017 - [info] Alive Slaves:
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3306) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:48:26 2017 - [info] GTID ON
Mon Jun 5 14:48:26 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 5 14:48:26 2017 - [info] 172.16.16.35(172.16.16.35:3307) Version=5.7.14-log (oldest major version between slaves) log-bin:enabled
Mon Jun 5 14:48:26 2017 - [info] GTID ON
Mon Jun 5 14:48:26 2017 - [info] Replicating from 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] Current Alive Master: 172.16.16.34(172.16.16.34:3306)
Mon Jun 5 14:48:26 2017 - [info] Checking slave configurations..
Mon Jun 5 14:48:26 2017 - [info] Checking replication filtering settings..
Mon Jun 5 14:48:26 2017 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jun 5 14:48:26 2017 - [info] Replication filtering check ok.
Mon Jun 5 14:48:26 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jun 5 14:48:26 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jun 5 14:48:26 2017 - [info] HealthCheck: SSH to 172.16.16.34 is reachable.
Mon Jun 5 14:48:26 2017 - [info]
172.16.16.34(172.16.16.34:3306) (current master)
+--172.16.16.35(172.16.16.35:3306)
+--172.16.16.35(172.16.16.35:3307)
 
Mon Jun 5 14:48:26 2017 - [info] Checking master_ip_failover_script status:
Mon Jun 5 14:48:26 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.16.34 --orig_master_ip=172.16.16.34 --orig_master_port=3306
 
 
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.16.20/24===
 
Checking the Status of the script.. OK
Mon Jun 5 14:48:26 2017 - [info] OK.
Mon Jun 5 14:48:26 2017 - [warning] shutdown_script is not defined.
Mon Jun 5 14:48:26 2017 - [info] Set master ping interval 1 seconds.
Mon Jun 5 14:48:26 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Jun 5 14:48:26 2017 - [info] Starting ping health check on 172.16.16.34(172.16.16.34:3306)..

切換以后我們發現現在主從是OK的了，但是我們忘記了很重要的一個問題，那就是slave的relay log,主從復制在缺省情況下從庫的relay logs會在SQL線程執行完畢后被自動刪除，但是對於MHA場景下，對於某些滯后從庫的恢復依賴於其他從庫的relay log，因此采取禁用自動刪除功能以及定期清理的辦法。對於清理過多過大的relay log需要注意引起的復制延遲資源開銷等。所以這里要將relay log的自動清除設置為OFF，采用手動清除relay log的方式：

mysql -uroot -h172.16.16.35 -P3306 -p123456 -e'set global relay_log_purge=OFF;'
mysql -uroot -h172.16.16.35 -P3306 -p123456 -e'set global relay_log_purge=OFF;'

默認設置為OFF，這樣relay lay每次SQL執行線程完畢后並不會被自動刪除了，所以說我們需要手動刪除掉relay log，在mha node的工具包里面有個purge_relay_logs工具來直接處理這個事情，

[root@mxqmongodb2 data]# purge_relay_logs --user=root --password=123456 --host=172.16.16.35 --port=3306
2017-06-06 09:11:23: purge_relay_logs script started.
Opening /home/mysql/db3306/data/mxqmongodb2-relay-bin.000001 ..
Opening /home/mysql/db3306/data/mxqmongodb2-relay-bin.000002 ..
Opening /home/mysql/db3306/data/mxqmongodb2-relay-bin.000003 ..
Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
2017-06-06 09:11:26: All relay log purging operations succeeded.
[root@mxqmongodb2 data]# purge_relay_logs --user=root --password=123456 --host=172.16.16.35 --port=3307
2017-06-06 09:11:41: purge_relay_logs script started.
Opening /home/mysql/db3307/data/mxqmongodb2-relay-bin.000001 ..
Opening /home/mysql/db3307/data/mxqmongodb2-relay-bin.000002 ..
Opening /home/mysql/db3307/data/mxqmongodb2-relay-bin.000003 ..
Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
2017-06-06 09:11:44: All relay log purging operations succeeded.

我們這樣就可以手動清除掉relay log了，也可以加到定時人物里面定時執行。

/usr/bin/purge_relay_logs --user=root --password=123456 --host=172.16.16.35 --port=3307

這樣算是搭建完了。

最后，這文章大量參考了大師兄的MHA博客：http://www.cnblogs.com/gomysql/p/3675429.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 MySQL MHA 搭建&測試 mysql搭建MHA高可用解析 linux下mysql5.7的MHA高可用架構搭建 MHA快速搭建 MySQL多實例 mha集群 mysql主從與mycat與MHA MySQL--MHA原理 Mysql 5.7 CentOS 7 安裝MHA MySQL高可用架構之MHA MHA環境搭建【4】manager相關依賴的解決