一、概述
本文將介紹mysql的MM+Keepalived方案。該方案由兩個mysql服務器組成,這兩個mysql互為主備。其中一台主作為寫服務器,另一台主作為讀服務器。通過keepalived軟件管理寫vip,當承擔寫服務器的mysql出現故障時,將寫vip漂移到讀服務器上,實現高可用。
二、節點介紹
本次實驗采用2台虛擬機,操作系統版本Centos6.10,mysql版本5.7.25
node1 10.40.16.61 主庫 提供寫服務
node2 10.40.16.62 主庫 提供讀服務
還須預留1個vip,現在不用配置,這里先提一下,后面的安裝步驟用得到
10.40.16.71 寫vip
三、安裝
1. 配置雙主架構
安利一個自己寫的mysql一鍵安裝腳本https://www.cnblogs.com/ddzj01/p/10678296.html
mysql搭建完成后,就可以配置互為主備的架構了。
這樣node1和node2就互為主備了
在node2上將數據庫設置為只讀模式
(root@localhost)[(none)]> set global read_only = 1;
2. 安裝keepalive軟件
node1&node2:
yum install -y keepalived
四、修改配置文件
1. node1
編輯配置文件/etc/keepalived/keepalived.conf
! Configuration File for keepalived vrrp_script chk_mysql { script "/etc/keepalived/check_mysql.sh" # 自定義檢查腳本 interval 30 # 設置檢查間隔時長,可自行設定 } vrrp_instance VI_1 { state BACKUP # BACKUP狀態,具體意思后面介紹 interface eth0 virtual_router_id 51 priority 100 advert_int 1 nopreempt # 防止主庫切換到從庫后,主庫恢復后自動切換回主庫 authentication { auth_type PASS auth_pass 1111 } track_script { chk_mysql } virtual_ipaddress { 10.40.16.71/24 # vip } }
編輯檢查mysql主庫的腳本文件/etc/keepalived/check_mysql.sh

#!/bin/bash source /root/.bash_profile ###填數據庫相關信息### DB_USER='root' DB_PASSWD='root' U_EMAIL='xxxx@163.com' ###################### ###判斷如果上次檢查的腳本還沒執行完,則退出此次執行 if [ `ps -ef | grep -w "$0" | grep -v "grep" | wc -l` -gt 2 ]; then exit 0 fi mysql_con="mysql -u$DB_USER -p$DB_PASSWD" error_log="/etc/keepalived/logs/check_mysql.err" ###如果error_log目錄不存在則創建目錄 if [ -d /etc/keepalived/logs ]; then usleep else mkdir -p /etc/keepalived/logs fi ###定義一個簡單判斷mysql是否可用的函數 function execute_query { $mysql_con -e "select 1;" 2>> $error_log } ###定義無法執行查詢,且mysql服務異常時的處理函數 function service_error { echo -e "`date "+%F %H:%M:%S"` ----mysql service error, now stop keepalived----" >> $error_log service keepalived stop >> $error_log 2>&1 echo "master1 keepalived stopped" | mail -s "master1 keepalived stopped, please take notice!" $U_EMAIL 2>> $error_log echo -e "\n---------------------------------------------------------\n" >> $error_log } ###定義無法執行查詢,但mysql服務正常的處理函數 function query_error { echo -e "`date "+%F %H:%M:%S"` ----query error, but mysql service ok, retry after 30s----" >> $error_log sleep 30 execute_query if [ $? -ne 0 ]; then echo -e "`date "+%F %H:%M:%S"` ----still can't execute query----" >> $error_log ###關閉本機mysql echo -e "`date "+%F %H:%M:%S"` ----stop mysql service----" >> $error_log service mysql stop &>> $error_log sleep 2 ###給執行和緩沖時間 ###關閉本機keepalived echo -e "`date "+%F %H:%M:%S"` ----stop keepalived----" >> $error_log service keepalived stop &>> $error_log echo "master1 keepalived stopped" | mail -s "master1 keepalived stopped, please take notice!" $U_EMAIL 2>> $error_log echo -e "\n---------------------------------------------------------\n" >> $error_log else echo -e "`date "+%F %H:%M:%S"` ----query ok after 30s----" >> $error_log echo -e "\n---------------------------------------------------------\n" >> $error_log fi } ###檢查開始: 執行查詢 execute_query if [ $? -ne 0 ]; then service mysql status &> /dev/null if [ $? -ne 0 ]; then service_error else query_error fi fi
chmod +x /etc/keepalived/check_mysql.sh
該腳本的作用有兩個
a. mysql進程掛掉了,直接關閉keepalived服務
b. mysql進程正常,但是無法執行查詢,比如進程卡死了等等原因。則同時關閉mysql和keepalived
2. node2
編輯配置文件/etc/keepalived/keepalived.conf
! Configuration File for keepalived
vrrp_instance VI_1 {
state BACKUP # BACKUP狀態,具體意思后面介紹
interface eth0
virtual_router_id 51
priority 90 # 優先級設置為90,這個值設置比節點1低
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
notify_master /etc/keepalived/notify_master_mysql.sh
virtual_ipaddress {
10.40.16.71/24 # vip
}
}
編輯檢查mysql從庫的腳本文件/etc/keepalived/notify_master_mysql.sh

#!/bin/bash source /root/.bash_profile ###當keepalived監測到本機轉為MASTER狀態時,執行該腳本 ###填數據庫相關信息### DB_USER='root' DB_PASSWD='root' U_EMAIL='xxxx@163.com' ###################### change_log='/etc/keepalived/logs/state_change.log' mysql_con="mysql -u$DB_USER -p$DB_PASSWD" echo -e "`date "+%F %H:%M:%S"` ----master2 keepalived change to MASTER----" >> $change_log ###如果error_log目錄不存在則創建目錄 if [ -d /etc/keepalived/logs ]; then usleep else mkdir -p /etc/keepalived/logs fi slave_info() { ###檢查從庫狀態 slave_stat=`$mysql_con -e "show slave status\G"` Slave_IO_Running=`echo $slave_stat | egrep -w "Slave_IO_Running" | awk '{print $2}'` Slave_SQL_Running=`echo $slave_stat | egrep -w "Slave_SQL_Running" | awk '{print $2}'` Master_Log_File=`echo $slave_stat | egrep -w "Master_Log_File" | awk '{print $2}'` Read_Master_Log_Pos=`echo $slave_stat | egrep -w "Read_Master_Log_Pos" | awk '{print $2}'` Relay_Master_Log_File=`echo $slave_stat | egrep -w "Relay_Master_Log_File" | awk '{print $2}'` Exec_Master_Log_Pos=`echo $slave_stat | egrep -w "Exec_Master_Log_Pos" | awk '{print $2}'` } action() { ###解除read_only屬性 echo -e "`date "+%F %H:%M:%S"` ----set read_only = 0 on master2----" >> $change_log $mysql_con -e "set global read_only = 0;" 2>> $change_log echo "master2 keepalived change to MASTER,線上數據庫切換至master2" | mail -s "master2 keepalived change to MASTER" $U_EMAIL 2>> $change_log echo -e "---------------------------------------------------------\n" >> $change_log } slave_info if [ $Slave_SQL_Running == 'Yes' ]; then i=0 #一個計數器 ###判斷從master接收到的binlog是否全部在本地執行(這樣仍無法完全確定從庫已追上主庫,因為無法完全保證io_thread沒有延時(但由網絡傳輸問題導致的從庫落后的概率很小) until [ $Master_Log_File == $Relay_Master_Log_File -a $Read_Master_Log_Pos == $Exec_Master_Log_Pos ] do if [ $i -lt 10 ]; then #將等待exec_pos追上read_pos的時間限制為20s echo -e "`date "+%F %H:%M:%S"` ----Relay_Master_Log_File=$Relay_Master_Log_File, Exec_Master_Log_Pos=$Exec_Master_Log_Pos is behind Master_Log_File=$Master_Log_File, Read_Master_Log_Pos=$Read_Master_Log_Pos, wait......" >> $change_log #輸出消息到日志,等待exec_pos=read_pos i=$(($i+1)) sleep 2 slave_info else echo -e "The waits time is more than 20s,now force change. Master_Log_File=$Master_Log_File Read_Master_Log_Pos=$Read_Master_Log_Pos Relay_Master_Log_File=$Relay_Master_Log_File Exec_Master_Log_Pos=$Exec_Master_Log_Pos" >> $change_log action exit 0 fi done action else echo -e "master2's slave status is not running,now force change. Master_Log_File=$Master_Log_File Read_Master_Log_Pos=$Read_Master_Log_Pos Relay_Master_Log_File=$Relay_Master_Log_File Exec_Master_Log_Pos=$Exec_Master_Log_Pos" >> $change_log action fi
chmod +x /etc/keepalived/notify_master_mysql.sh
該腳本的作用有三個
a. slave sql線程沒有運行,直接將從庫只讀關閉
c. slave sql線程正在運行,如果從庫沒有延遲,直接將從庫只讀關閉
b. slave sql線程正在運行,如果從庫有延遲,等待一段時間(這個自己設置)再將從庫只讀關閉
這里解釋下keepalived.conf的"state BACKUP"的意思,在Keepalived中有兩種模式,分別是master->backup模式和backup->backup模式,這兩種模式有什么區別呢?
在master->backup模式下,一旦主庫宕掉,虛擬IP會自動漂移到從庫,當主庫修復后,keepalived啟動后,還會把虛擬IP搶過來,即使你設置nopreempt(不搶占)的方式搶占IP的動作也會發生。
在backup->backup模式下,當主庫宕掉后虛擬IP會自動漂移到從庫上,當原主恢復之后重啟keepalived服務,並不會搶占新主的虛擬IP,即使是原主優先級高於從庫的優先級別,也不會搶占虛擬IP。
所以,為了減少虛擬IP的漂移次數,生產中我們通常是把修復好的主庫當做新主庫的備庫。因而采用backup->backup模式居多。
五、啟動Keepailived
node1&node2:
service keepalived start
注意因為我們使用的是backup->backup模式,所以啟動keepalived的順序需要先啟動node1,再啟動node2,這樣vip才會在node1上。如果先啟動node2,再啟動node1,node1並不會把虛擬IP搶過來。
六、測試
1. 模擬主庫宕機的情況
關閉node1的mysql數據庫
service mysql stop
查看node1的keepalived日志/etc/keepalived/logs/check_mysql.err
查看node1和node2的vip,發現vip已經轉移到了node2
node2的切換日志可以查看/etc/keepalived/logs/state_change.log
看看有沒有收到郵件,呵呵:-)
failover正常!
2. 模擬主庫正常,但是無法查詢
重新打開node1的mysql數據庫和keepalived服務
service mysql start
service keepalived start
重啟node2的keepalived服務
service keepalived restart
在node2上將數據庫設置為只讀模式
(root@localhost)[(none)]> set global read_only = 1;
這樣就跟最初的狀態一致了,主庫是node1,從庫是node2,vip在node1上。
在node1上將參數max_connections設置得足夠小
(root@localhost)[(none)]> set global max_connections = 2;
在node1上然后多開幾個連接,直到出現無法連接的情況。以此來模擬無法查詢的情況。
查看node1的keepalived日志/etc/keepalived/logs/check_mysql.err
查看node1,發現keepalived和mysql服務都已經停止
[root@mysqla ~]# service mysql status
MySQL is not running
[root@mysqla ~]# service keepalived status
keepalived is stopped
vip也已經漂移過來了
[root@mysqlb ~]# ip a
failover正常!
七、手動切換
還有一個場景就是,如何手工切換。舉個例子,node2目前是主庫,但是node1通過各種辦法修復好了,我想讓node1當主庫。
重啟node1的mysql服務
service mysql restart
這樣node2是主庫,node1是備庫,vip在node2上
在node2上創建手工切換的腳本vi /etc/keepalived/manual_switch_to_master

#!/bin/bash source /root/.bash_profile ###在master2上手動執行將主庫切換回master1的操作 ###填數據庫相關信息### DB_USER='root' DB_PASSWD='root' U_EMAIL='xxxx@163.com' MASTER1='10.40.16.61' MASTER2='10.40.16.62' REPL_USER='repl' REPL_PASSWD='123456' MASTER1_MYSQL_PATH='/usr/local/mysql/bin' ###################### ###如果error_log目錄不存在則創建目錄 if [ -d /etc/keepalived/logs ]; then usleep else mkdir -p /etc/keepalived/logs fi mysql_con="mysql -u$DB_USER -p$DB_PASSWD" echo -e "`date "+%F %H:%M:%S"` ----change to BACKUP manually----" >> /etc/keepalived/logs/state_change.log echo -e "`date "+%F %H:%M:%S"` ----set read_only = 1 on master2----" >> /etc/keepalived/logs/state_change.log $mysql_con -e "set global read_only = 1;" >> /etc/keepalived/logs/state_change.log ###kill掉當前客戶端連接 echo -e "`date "+%F %H:%M:%S"` ----kill current client thread----" >> /etc/keepalived/logs/state_change.log if [ -e /tmp/kill.sql ]; then rm -f /tmp/kill.sql &> /dev/null fi ###這里其實是一個批量kill線程的小技巧 $mysql_con -e 'select concat("kill ",id,";") from information_schema.processlist where command="Query" or command="Execute" into outfile "/tmp/kill.sql";' $mysql_con -e "source /tmp/kill.sql" 2>> /etc/keepalived/logs/state_change.log sleep 2 ###給kill一個執行和緩沖時間 slave_info() { ###檢查從庫狀態 slave_stat=`mysql -u$REPL_USER -p$REPL_PASSWD -h$MASTER1 -e "show slave status\G"` Master_Log_File=`echo $slave_stat | egrep -w "Master_Log_File" | awk '{print $2}'` Read_Master_Log_Pos=`echo $slave_stat | egrep -w "Read_Master_Log_Pos" | awk '{print $2}'` Relay_Master_Log_File=`echo $slave_stat | egrep -w "Relay_Master_Log_File" | awk '{print $2}'` Exec_Master_Log_Pos=`echo $slave_stat | egrep -w "Exec_Master_Log_Pos" | awk '{print $2}'` } slave_info until [ $Read_Master_Log_Pos = $Exec_Master_Log_Pos -a $Master_Log_File = $Relay_Master_Log_File ] do echo -e "`date "+%F %H:%M:%S"` ----Relay_Master_Log_File=$Relay_Master_Log_File, Exec_Master_Log_Pos=$Exec_Master_Log_Pos is behind Master_Log_File=$Master_Log_File, Read_Master_Log_Pos=$Read_Master_Log_Pos, wait......" >> /etc/keepalived/logs/state_change.log sleep 2 slave_info done ###然后解除master1的read_only屬性並打開keepalived服務 echo -e "`date "+%F %H:%M:%S"` ----set read_only = 0 on master1----" >> /etc/keepalived/logs/state_change.log ssh $MASTER1 "$MASTER1_MYSQL_PATH/mysql -u$DB_USER -p$DB_PASSWD -e 'set global read_only = 0;' && /etc/init.d/keepalived start" 2>> /etc/keepalived/logs/state_change.log ###重啟master2的keepalived服務,使VIP漂移到master1 echo -e "`date "+%F %H:%M:%S"` ----make VIP move to master1----" >> /etc/keepalived/logs/state_change.log /etc/init.d/keepalived restart &>> /etc/keepalived/logs/state_change.log echo "master2 keepalived restart,vip change to master1" | mail -s "master2 keepalived change to BACKUP" $U_EMAIL 2>> /etc/keepalived/logs/state_change.log echo -e "\n--------------------------------------------------\n" >> /etc/keepalived/logs/state_change.log
在node2上執行腳本
chmod +x /etc/keepalived/manual_switch_to_master
[root@mysqlb keepalived]# sh /etc/keepalived/manual_switch_to_master
手工切換完成!
八、總結
mm+keepalive的配置簡單,相對於傳統的主從架構,能實現比較簡單的寫庫故障轉移。
文章參考:https://www.cnblogs.com/ivictor/p/5522383.html