MySQL高可用架構之Keepalived+主從架構部署


針對目前公司的數據庫架構要做統計規划和調整,所以針對不同的業務環境要選擇合適的架構就比較重要,同時作為運維人員又要考慮到維護的便捷性(不能做過多架構維護),最終停留在mha+vip/主從+vip/mycat集群/域名+主從這幾種架構或組合,這里就分布對不同架構部署做下記錄並對比起優劣,針對不同場景做出合適的選擇。

本文主要針對主備自動切換,即VIP+主從方式,其中VIP通過keepalived控制。

一、環境描述

服務器IP

操作系統

數據庫

硬件配置

備注

192.168.1.23

RHEL6.6_x86_64

MySQL 5.7.21

內存:16GB

CPU:4cores

硬盤:100GB

VIP:192.168.1.27

Keepalived:1.4.4

192.168.1.24

RHEL6.6_x86_64

MySQL 5.7.21

內存:16GB

CPU:4cores

硬盤:100GB

二、業務場景

1、優點

1)部署簡單,管理方便。

2)可實現主備庫VIP切換,對業務無感知。

2、缺點

1)存在腦裂風險

2)存在切換主備,狀態無檢測風險

3)跨VLAN不支持

4)看似部署簡單,實則需要考慮的點非常多,如果考慮不完善建議慎用

3、業務場景

總而言之,此架構不推薦使用,使用時需謹慎。

1)雙節點數據庫服務器,僅僅為了實現主備VIP切換業務無感知

2)多個從節點,通過LVS,做負載均衡(現在不推薦這種架構實現負載均衡)

3)兩節點的雙向復制可以降低部分風險(來回切換VIP、腦裂等),單無法避免

三、數據庫安裝和主從部署

四、keepalived安裝部署

1.keepalived下載

下載地址:http://www.keepalived.org/software/keepalived-1.4.4.tar.gz

2.keepalived安裝

主庫:
[root@node1 tools]# tar -xzvf keepalived-1.4.4.tar.gz
[root@node1 tools]# cd keepalived-1.4.4
[root@node1 keepalived-1.4.4]# ./configure  --prefix=/usr/local/keepalived
[root@node1 keepalived-1.4.4]# make -j 4 
[root@node1 keepalived-1.4.4]# make install
[root@node1 keepalived-1.4.4]# cp -rp ./keepalived/etc/init.d/keepalived /etc/init.d/
[root@node1 keepalived-1.4.4]# chkconfig  --add keepalived
[root@node1 keepalived-1.4.4]# mkdir /etc/keepalived
[root@node2 keepalived-1.4.4]# cp -rp ./bin/keepalived  /usr/bin/
[root@node1 keepalived-1.4.4]# cp ./keepalived/etc/keepalived/keepalived.conf /etc/keepalived [root@node1 keepalived-1.4.4]# cp -rp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/ [root@node1 keepalived-1.4.4]# service keepalived status keepalived is stopped 備庫: 安裝keepalived同上 ###備注 *** WARNING - this build will not support IPVS with IPv6. Please install libnl/libnl-3 dev libraries to support IPv6 with IPVS. ###

3.keepalived配置

主庫配置:

[root@node1 keepalived-1.4.4]# cp -rp  /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf_bak
[root@node1 keepalived-1.4.4]# >/etc/keepalived/keepalived.conf
[root@node1 keepalived-1.4.4]# vi /etc/keepalived/keepalived.conf

! Configuration File for keepalived

global_defs {
   notification_email {
     rangle@163.com
   }
   notification_email_from rangle@163.com
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id Node_Master } vrrp_instance VI_1 { state BACKUP ##可配置master和backup模式,為了防止腦裂現象,主備均需要設置為backup模式,master模式會搶占VIP interface eth1 virtual_router_id
43 ##VRRP組名,兩個節點的設置必須一樣,以指明各個節點屬於同一VRRP組 priority 100 nopreempt ##配合backup,防止切換后,主庫服務恢復正常后,IP漂移過來 advert_int 1 ###組播信息發送間隔,兩個節點設置必須一樣 authentication { ###設置驗證信息,兩個節點必須一致 auth_type PASS auth_pass 1111 } virtual_ipaddress { ###指定虛擬IP, 兩個節點設置必須一樣 192.168.1.27 label eth1:1 } } virtual_server 192.168.1.27 3306 { delay_loop 6 lb_algo wrr #lvs調度算法rr|wrr|lc|wlc|lblc|sh|dh lb_kind DR #負載均衡轉發規則NAT|DR|RUN persistence_timeout 50 protocol TCP real_server 192.168.1.23 3306 { weight 1 notify_down /etc/keepalived/mysql.sh TCP_CHECK { connect_timeout 10 nb_get_retry 3 delay_before_retry 3 connect_port 3306 } } }

備庫配置:

[root@node2 keepalived-1.4.4]# cp -rp  /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf_bak
[root@node2 keepalived-1.4.4]# >/etc/keepalived/keepalived.conf
[root@node2 keepalived-1.4.4]# vi /etc/keepalived/keepalived.conf

! Configuration File for keepalived
  global_defs {
      notification_email {
      rangle@163.com
      }
      notification_email_from rangle@163.com
      smtp_server 127.0.0.1
      smtp_connect_timeout 30
      router_id Node_backup
     }
 vrrp_instance VI_1 {
      state BACKUP
      interface eth1
      virtual_router_id 43
      priority 90                   ##主優先級高於備,主設置100,備可設置90,(1-254之間)
      advert_int 1
      authentication {
      auth_type PASS
      auth_pass 1111
      }
      virtual_ipaddress {
      192.168.1.27 label eth1:1
      }
      }
 virtual_server 192.168.1.27 3306 {
      delay_loop 2                 ##每隔2秒檢查真實服務器狀態
      lb_algo wrr                  ##LVS算法,如果僅做高可用可不設
      lb_kind DR                   ##LVS模式,如果僅做高可用可不設
      persistence_timeout 60 
      protocol TCP 

      real_server 192.168.1.24 3306{
      weight 3
      notify_down /etc/keepalived/mysql.sh     ##如果mysql服務宕掉,執行的腳本
      echo '3' >  /etc/keepalived/t.log
      TCP_CHECK {
      connect_timeout 10
      nb_get_retry 3     
      delay_before_retry 3
      connect_port 3306 
      }
      }
}

mysql.sh配置內容:

#!/bin/bash

run_status=`service keepalived status|grep running|wc -l`
pro_status=`ps -ef |grep keepalived|grep -v grep |wc -l`

service keepalived stop

if [ ${run_status} != 0 ] || [ ${pro_status} != 0 ]
then
   pkill keepalived
fi

、keepalived優化

1、keepalived+mysql風險點

1)腦裂風險

現象:

如果由於主備內部網絡通信(vrrp組播)故障(網絡波動、防火牆等)導致VIP切換,但實際主備服務均正常,此時VIP會存在兩個節點上,這個就會導致主備數據同時寫入導致數據不一致。

優化方案:

****主庫節點增加腦裂檢查腳本,通過本機增加網關鏈路的檢查,增加仲裁節點,判斷是否本機對外的網絡出現問題,此時在配合VRRP組播,如果網絡存在問題則直接關閉keepalived和mysql服務,然后發送告警郵件、短信。

2)主從同步漂移風險

現象:

VIP漂移不會檢查從庫狀態,即無論從庫數據是否同步,VIP都將完成漂移,數據不一致風險較大。

優化方案:

****在從節點增加主從狀態檢查腳本,判斷主從同步進程是否正常,同時判斷是否有數據延遲,如果有延遲或同步錯誤則不能完成切換,直接關閉keepalived服務

2、優化后配置如下

具體優化配置和腳本如下:

主庫配置:

keepalived.conf

! Configuration File for keepalived

global_defs {
   notification_email {
     abc@163.com
   }
   notification_email_from root@localhost
   smtp_server root@localhost
   smtp_connect_timeout 30
   router_id node1
}


vrrp_script check_local {
    script "/etc/keepalived/check_gateway.sh"
    interval 5
    fall     4
    rise     4
    weight 3
}


vrrp_instance VI_1 {
    state BACKUP
    interface eth1
    virtual_router_id 43
    priority  100
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }

    virtual_ipaddress {
        192.168.1.27 label eth1:1
    }
    track_script {
        check_local
    }

}
virtual_server 192.168.1.27 3306 {
    delay_loop 6
    lb_algo wrr          #lvs調度算法rr|wrr|lc|wlc|lblc|sh|dh
    lb_kind DR           #負載均衡轉發規則NAT|DR|RUN
    persistence_timeout 50
    protocol TCP

    real_server 192.168.1.23 3306 {
      weight 3
      notify_down /etc/keepalived/mysql.sh 
      TCP_CHECK {
      connect_timeout 10
      nb_get_retry 3 
      delay_before_retry 3
      connect_port 3306 
        }
    }
}

備庫配置:

keepalived.conf

! Configuration File for keepalived
global_defs {
      notification_email {
      abc@163.com
      }
      notification_email_from root@localhost
      smtp_server root@localhost
      smtp_connect_timeout 30
      router_id node2
     }

vrrp_script check_local {
    script "/etc/keepalived/check_gateway.sh "
    interval 5
    fall     4
    rise     4
    weight 3
}


vrrp_instance VI_1 {
      state BACKUP
      interface eth1
      virtual_router_id 43
      priority 90
      advert_int 1
      authentication {
      auth_type PASS
      auth_pass 1111
      }

      virtual_ipaddress {
      192.168.1.27 label eth1:1
      }
    track_script {
        check_local
    }

      }


 virtual_server 192.168.1.27 3306 {
      delay_loop 2 
      lb_algo wrr 
      lb_kind DR
      persistence_timeout 60 
      protocol TCP 

      real_server 192.168.1.24 3306{
      weight 3
      notify_down /etc/keepalived/mysql.sh 
      echo '3' >  /etc/keepalived/t.log
      TCP_CHECK {
      connect_timeout 10
      nb_get_retry 3     
      delay_before_retry 3
      connect_port 3306 
      }
      }
}

腦裂檢查腳本:

/etc/keepalived/check_gateway.sh

#!/bin/sh
VIP=192.168.1.27
GATEWAY=192.168.1.254

/sbin/arping -I eth1 -c 5 -s $VIP $GATEWAY  &>/dev/null

gateway_status=$?
keepalived_status=`ps -ef |grep keepalived|grep -v grep |wc -l`


if [ ${gateway_status} != 0 ] && [ ${keepalived_status} = 0 ]
then
   service keepalived start
elif [ ${gateway_status} != 0 ] && [ ${keepalived_status} != 0 ]
   service keepalived stop
fi

 數據庫服務宕機轉移腳本

/etc/keepalived/mysql.sh

#!/bin/bash

run_status=`service keepalived status|grep running|wc -l`
pro_status=`ps -ef |grep keepalived|grep -v grep |wc -l`

service keepalived stop

if [ ${run_status} != 0 ] || [ ${pro_status} != 0 ]
then
   pkill keepalived
fi

 從庫狀態檢查腳本(在備庫做crontab定時任務*/1 * * * * sh /etc/keepalived/check_slave.sh >/dev/null 2>&1)

#!/bin/bash

VIP="192.168.1.27"

vip_status=`ip add |egrep "${VIP}"|wc -l`
keepalived_status=`ps -ef |grep keepalived|grep -v grep|wc -l`

slave_status=`mysql -uroot -e "show slave status \G"|egrep  "Slave.*Running|Seconds_Behind_Master|Last_Errno"|grep -v "Slave_SQL_Running_State"|awk -F ':' '{printf("%s",$NF)}'`
io_status=`echo "${slave_status}"|awk '{print $1}'|grep "Yes"|wc -l`
sql_status=`echo "${slave_status}"|awk '{print $2}'|grep "Yes"|wc -l`
sync_status=`echo "${slave_status}"|awk '{print $3}'`
sync_time=`echo "${slave_status}"|awk '{print $4}'`

if [ ${io_status} -ne 1 ] || [ ${sql_status} -ne 1 ] || [ ${sync_status} -ne 0 ]
then
    if [ ${vip_status} -eq 0 ] && [ ${keepalived_status} -ne 0 ];then
        service keepalived stop
    fi
elif [ ${io_status} -eq 1 ] && [ ${sql_status} -eq 1 ] && [ ${sync_status} -eq 0 ]
then
    if [ ${sync_time} -ge 120 ];then
        if [ ${vip_status} -eq 0 ] && [ ${keepalived_status} -ne 0 ];then
           service keepalived stop
        fi
    elif [ ${sync_time} -eq 0 ];then
        if [ ${vip_status} -eq 0 ] && [ ${keepalived_status} -eq 0 ];then
           service keepalived start
        fi
    fi
fi

3、防火牆端口開放

VRRP是用IP多播的方式(默認多播地址224.0.0.18)實現高可用對之間通信的,如果開啟防火牆記得開放這個vrrp協議對應的策略

iptables -A INPUT -p vrrp -d 224.0.0.18   -j ACCEPT

iptables -A INPUT -p vrrp   -j ACCEPT

、keepalived故障轉移測試

1、主數據庫實例宕機

當前主數據庫主機狀態:

[root@node1 keepalived]# service mysqld status
MySQL (Percona Server) running (22983)                     [  OK  ]
[root@node1 keepalived]# service keepalived status
keepalived (pid  14092) is running...
[root@node1 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1
    inet 192.168.1.27/32 scope global eth1:1
    inet6 fe80::250:56ff:fe8d:496/64 scope link 

當前從數據庫主機狀態:

[root@node2 keepalived]# service mysqld status
MySQL (Percona Server) running (14146)                     [  OK  ]
[root@node2 keepalived]# service keepalived status
keepalived (pid  12207) is running...
[root@node2 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1
    inet6 fe80::250:56ff:fe8d:3d1e/64 scope link 

主庫執行關閉mysql實例命令,數據庫關閉后發現VIP已漂移

[root@node1 keepalived]# service mysqld stop
Shutting down MySQL (Percona Server)...............        [  OK  ]
[root@node1 keepalived]# service mysqld status    
MySQL (Percona Server) is not running                      [FAILED]
[root@node1 keepalived]# service keepalived status
keepalived is stopped
[root@node1 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1
    inet6 fe80::250:56ff:fe8d:496/64 scope link 

此時檢查從庫相關狀態

[root@node2 keepalived]# service mysqld status
MySQL (Percona Server) running (14146)                     [  OK  ]
[root@node2 keepalived]# service keepalived status
keepalived (pid  12207) is running...
[root@node2 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1
    inet 192.168.1.27/32 scope global eth1:1
    inet6 fe80::250:56ff:fe8d:3d1e/64 scope link 
[root@node2 keepalived]# mysql -uroot -e "show slave status \G"|egrep  "Slave.*Running|Seconds_Behind_Master|Last_Errno"|grep -v "Slave_SQL_Running_State"
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
                   Last_Errno: 0
        Seconds_Behind_Master: NULL

2、主數據庫服務器宕機

同上,實現檢查主備狀態后,在主庫執行shutdown -h now的命令后,檢查從庫狀態

3、主keepalived服務停止

同上

4、主服務器網卡故障

此時keepalived服務正常,mysqld服務正常,VRRP通信無法完成的情況下,主庫會執行關閉keepalived服務並完成切換

 

5、模擬VRRP通信故障(腦裂)

在備庫開啟防火牆,並不開通vrrp允許,此時VIP會存在於兩邊的服務器上

[root@node2 keepalived]# service iptables start
iptables: Applying firewall rules:                         [  OK  ]

主庫服務器狀態檢查

[root@node1 keepalived]# ip add|grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1
    inet 192.168.1.27/32 scope global eth1:1
    inet6 fe80::250:56ff:fe8d:496/64 scope link 

備庫服務器狀態檢查:

[root@node2 keepalived]# ip add|grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1
    inet 192.168.1.27/32 scope global eth1:1
    inet6 fe80::250:56ff:fe8d:3d1e/64 scope link 

如果關掉從庫的防火牆,那么VIP會選擇高優先級的服務器上(priority)

此現象如何避免?我們可以在check_gateway.sh腳本里添加上組播狀態檢查的命令,我這里就不做了,僅做了網關檢查的腦裂避免(網絡問題導致網關暫時不可達而產生的腦裂)

6、模擬主切換到備后,主服務啟動后是否會回切(優先級搶占測試)

(本次測試接着測試1進行,首先恢復主庫的實例進程和keepalived進程)

主庫執行命令:

[root@node1 keepalived]# service mysqld start
Starting MySQL (Percona Server)......                      [  OK  ]
[root@node1 keepalived]# service keepalived start
Starting keepalived:                                       [  OK  ]
[root@node1 keepalived]# service mysqld status    
MySQL (Percona Server) running (1141)                      [  OK  ]
[root@node1 keepalived]# service keepalived status
keepalived (pid  1230) is running...
[root@node1 keepalived]# ip add |grep inet        
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1
    inet6 fe80::250:56ff:fe8d:496/64 scope link 

備庫執行命令:

[root@node2 keepalived]# service keepalived status
keepalived (pid  12207) is running...
[root@node2 keepalived]# service mysqld status    
MySQL (Percona Server) running (14146)                     [  OK  ]
[root@node2 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1
    inet 192.168.1.27/32 scope global eth1:1
    inet6 fe80::250:56ff:fe8d:3d1e/64 scope link 

此時發現即使主庫的服務恢復正常也不會主動搶占從庫的VIP,因為兩邊設置的都是backup模式,但此時如果備庫的服務、實例等原因導致需要切換VIP而此時主庫由具備切換的條件,那么依舊可以完成切換(此時切記一旦完成主備切換后,需要手動恢復新的主備狀態保障數據一致性后才能進行二次切換)

7、模擬備庫數據延遲120S以上切換

從庫先停止io_thread

(root:localhost:Sun May 20 10:24:45 2018)[(none)]>stop slave io_thread;
Query OK, 0 rows affected (0.00 sec)

主庫做大量的數據變更操作

然后監控從庫keepalived狀態即可

8、模擬備庫數據同步錯誤切換

從庫狀態:

[root@node2 keepalived]# mysql -uroot -e "show slave status \G"|egrep  "Slave.*Running|Seconds_Behind_Master|Last_Errno"|grep -v "Slave_SQL_Running_State"
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                   Last_Errno: 1050
        Seconds_Behind_Master: NULL
[root@node2 keepalived]# service keepalived status
keepalived is stopped
[root@node2 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1
    inet6 fe80::250:56ff:fe8d:3d1e/64 scope link 

主庫狀態:

[root@node1 keepalived]# service mysqld status
MySQL (Percona Server) running (1141)                      [  OK  ]
[root@node1 keepalived]# service keepalived  status       
keepalived (pid  1230) is running...
[root@node1 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1
    inet 192.168.1.27/32 scope global eth1:1
    inet6 fe80::250:56ff:fe8d:496/64 scope link 

此時如果主庫相關服務宕機,是無法完成VIP切換,通過關閉mysqld服務測試

主庫已剔除VIP
[root@node1 keepalived]# service mysqld stop
Shutting down MySQL (Percona Server)..............         [  OK  ]
[root@node1 keepalived]# ip addr |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1
    inet6 fe80::250:56ff:fe8d:496/64 scope link 
從庫沒有附加上VIP
[root@node2 keepalived]# ip add |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1
    inet6 fe80::250:56ff:fe8d:3d1e/64 scope link 

附錄:
導致裂腦發生的原因

高可用服務器之間心跳線鏈路故障,導致無法正常通信。
心跳線壞了(包括斷了,老化)。
網卡即相關驅動壞了,IP配置及沖突問題(網卡直連)
心跳線間連接的設備故障(網卡及交換機)
仲裁的機器出問題(采用仲裁的方案)。
高可用服務器對上開啟了iptables防火牆阻擋了心跳信息傳輸。
高可用服務器對上心跳網卡地址等信息配置不正確,導致發送心跳失敗。
其他服務器配置不當等原因,如心跳方式不同,心跳廣播沖突,軟件BUG

在實際生產環境中,我們可以從以下幾個方面來防止裂腦問題的發生。
(1)同時使用串行電纜和以太網電纜連接,同時用兩條心跳線路,這樣一條線路壞了,另一個還是好的,依然能傳送心跳消息。
(2)當檢測到裂腦時強行關閉一個心跳節點(這個功能需特殊設備支持,如fence,stonith)。相當於備節點接收不到心跳信息,發送關機命令通過單獨的線路關閉主節點電源。
(3)做好對裂腦的監控報警(如郵件及手機短信等),在問題發生時人為的第一時間介入仲裁,降低損失。例如:百度的監控報警短信就有上行和下行的區別。報警信息到管理員手機上,就可以通過回復對應的字符串等操作就可以返回給服務器,讓服務器根據指令自動執行處理相關故


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM