在manager 主機上開啟監控服務,啟動不了
[root@manager ~]# managerStart [1] 1472 [root@manager ~]# managerStatus app1 is stopped(2:NOT_RUNNING). [1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
#說明: 這里我對啟動服務的命令做了 別名命令。
#查看日志 發現有這么一句話:
Sun Mar 11 14:18:58 2018 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln781] Multi-master configuration is detected,
but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below: Master 10.0.0.50(10.0.0.50:3306) Master 10.0.0.60(10.0.0.60:3306), replicating from 10.0.0.50(10.0.0.50:3306)
這句話的大概意思,有兩個成為主,而且兩個都可寫,按照原則同一時間只能有一台主機可以數據寫入,不然可能會造成數據不一致的災難性故障!
在10.0.0.60 上開啟mysql設置開啟只讀
mysql -e 'set global read_only=1'
設置完,還沒完依舊開啟不了這個監控程序,錯誤依舊存在
Sun Mar 11 14:44:29 2018 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.0.0.50(10.0.0.50:3306) Sun Mar 11 14:44:29 2018 - [info] Master configurations are as below: Master 10.0.0.50(10.0.0.50:3306) Master 10.0.0.60(10.0.0.60:3306), replicating from 10.0.0.50(10.0.0.50:3306), read-only Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln726] Slave 10.0.0.70(10.0.0.70:3306) replicates from 10.0.0.60:3306, but real master is 10.0.0.50(10.0.0.50:3306)! Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/local/share/perl5/MHA/MasterMonitor.pm line 326 Sun Mar 11 14:44:29 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Sun Mar 11 14:44:29 2018 - [info] Got exit code 1 (Not master dead).
分析了 下,為什么會出現兩個master呢? 因為之前模擬master宕機故障之后,vip飄到60並且60主機被提升為主,70主機本來是50主機的小弟,現在成為了60主機的小弟,這就導致了出現兩個master,
為了驗證我這樣的猜想,我強行設置,70跟隨50 混,就change master to 指定 主機是50 什么位置信息和binlog文件也是50主機的信息
( ̄▽ ̄)"哈哈,猜中。。。開森了下。。
[root@manager ~]# managerStatus app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again. [root@manager ~]# managerStatus app1 (pid:1520) is running(0:PING_OK), master:10.0.0.50
Sun Mar 11 15:02:01 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Mar 11 15:02:01 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Mar 11 15:02:01 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Mar 11 15:02:01 2018 - [info] MHA::MasterMonitor version 0.56. Sun Mar 11 15:02:01 2018 - [info] GTID failover mode = 0 Sun Mar 11 15:02:01 2018 - [info] Dead Servers: Sun Mar 11 15:02:01 2018 - [info] Alive Servers: Sun Mar 11 15:02:01 2018 - [info] 10.0.0.50(10.0.0.50:3306) Sun Mar 11 15:02:01 2018 - [info] 10.0.0.60(10.0.0.60:3306) Sun Mar 11 15:02:01 2018 - [info] 10.0.0.70(10.0.0.70:3306) Sun Mar 11 15:02:01 2018 - [info] Alive Slaves: Sun Mar 11 15:02:01 2018 - [info] 10.0.0.60(10.0.0.60:3306) Version=5.6.16-log (oldest major version between slaves) log-bin:enabled Sun Mar 11 15:02:01 2018 - [info] Replicating from 10.0.0.50(10.0.0.50:3306) Sun Mar 11 15:02:01 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sun Mar 11 15:02:01 2018 - [info] 10.0.0.70(10.0.0.70:3306) Version=5.6.16 (oldest major version between slaves) log-bin:disabled Sun Mar 11 15:02:01 2018 - [info] Replicating from 10.0.0.50(10.0.0.50:3306) Sun Mar 11 15:02:01 2018 - [info] Current Alive Master: 10.0.0.50(10.0.0.50:3306) Sun Mar 11 15:02:01 2018 - [info] Checking slave configurations.. Sun Mar 11 15:02:01 2018 - [warning] relay_log_purge=0 is not set on slave 10.0.0.60(10.0.0.60:3306). Sun Mar 11 15:02:01 2018 - [warning] relay_log_purge=0 is not set on slave 10.0.0.70(10.0.0.70:3306). Sun Mar 11 15:02:01 2018 - [warning] log-bin is not set on slave 10.0.0.70(10.0.0.70:3306). This host cannot be a master. Sun Mar 11 15:02:01 2018 - [info] Checking replication filtering settings.. Sun Mar 11 15:02:01 2018 - [info] binlog_do_db= , binlog_ignore_db= Sun Mar 11 15:02:01 2018 - [info] Replication filtering check ok. Sun Mar 11 15:02:01 2018 - [info] GTID (with auto-pos) is not supported Sun Mar 11 15:02:01 2018 - [info] Starting SSH connection tests.. Sun Mar 11 15:02:02 2018 - [info] All SSH connection tests passed successfully. Sun Mar 11 15:02:02 2018 - [info] Checking MHA Node version.. Sun Mar 11 15:02:03 2018 - [info] Version check ok. Sun Mar 11 15:02:03 2018 - [info] Checking SSH publickey authentication settings on the current master.. Sun Mar 11 15:02:04 2018 - [info] HealthCheck: SSH to 10.0.0.50 is reachable. Sun Mar 11 15:02:04 2018 - [info] Master MHA Node version is 0.56. Sun Mar 11 15:02:04 2018 - [info] Checking recovery script configurations on 10.0.0.50(10.0.0.50:3306).. Sun Mar 11 15:02:04 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000002 Sun Mar 11 15:02:04 2018 - [info] Connecting to root@10.0.0.50(10.0.0.50:22).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /mysql/data, up to mysql-bin.000002 Sun Mar 11 15:02:04 2018 - [info] Binlog setting check done. Sun Mar 11 15:02:04 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sun Mar 11 15:02:04 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.0.60 --slave_ip=10.0.0.60 --slave_port=3306 --workdir=/tmp --target_version=5.6.16-log --manager_version=0.56 --relay_log_info=/mysql/data/relay-log.info --relay_dir=/mysql/data/ --slave_pass=xxx Sun Mar 11 15:02:04 2018 - [info] Connecting to root@10.0.0.60(10.0.0.60:22).. Checking slave recovery environment settings.. Opening /mysql/data/relay-log.info ... ok. Relay log found at /mysql/data, up to cadicate-master-relay-bin.000005 Temporary relay log file is /mysql/data/cadicate-master-relay-bin.000005 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Mar 11 15:02:05 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.0.0.70 --slave_ip=10.0.0.70 --slave_port=3306 --workdir=/tmp --target_version=5.6.16 --manager_version=0.56 --relay_log_info=/mysql/data/relay-log.info --relay_dir=/mysql/data/ --slave_pass=xxx Sun Mar 11 15:02:05 2018 - [info] Connecting to root@10.0.0.70(10.0.0.70:22).. Checking slave recovery environment settings.. Opening /mysql/data/relay-log.info ... ok. Relay log found at /mysql/data, up to slave-relay-bin.000002 Temporary relay log file is /mysql/data/slave-relay-bin.000002 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Mar 11 15:02:05 2018 - [info] Slaves settings check done. Sun Mar 11 15:02:05 2018 - [info] 10.0.0.50(10.0.0.50:3306) (current master) +--10.0.0.60(10.0.0.60:3306) +--10.0.0.70(10.0.0.70:3306) Sun Mar 11 15:02:05 2018 - [info] Checking master_ip_failover_script status: Sun Mar 11 15:02:05 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.50 --orig_master_ip=10.0.0.50 --orig_master_port=3306 IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start=== Checking the Status of the script.. OK Sun Mar 11 15:02:05 2018 - [info] OK. Sun Mar 11 15:02:05 2018 - [warning] shutdown_script is not defined. Sun Mar 11 15:02:05 2018 - [info] Set master ping interval 1 seconds. Sun Mar 11 15:02:05 2018 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s server03 -s server02 Sun Mar 11 15:02:05 2018 - [info] Starting ping health check on 10.0.0.50(10.0.0.50:3306).. Sun Mar 11 15:02:05 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
分析日志,分析日志,分析日志,重要事情強調3遍!