客戶的一套生產環境采用的架構是Oracle ADG + Keepalived,近期需要進行切換演練,要求我這邊保障。ADG本身切換倒沒啥可說的,但引入keepalived軟件,就需要提前研究下這個架構。其實看了下環境配置,整體思路也非常簡單,說白了就是利用keepalived軟件引入一個VIP,應用側只需配置連接這個VIP即可。
依據當前生產環境架構模擬了一套自己的測試環境。
1.Keepalived相關配置
關於Keepalived軟件的配置和編譯安裝,可以參考之前《MySQL主主+Keepalived架構安裝部署》中Keepalived安裝部署章節。 除了利用keepalived軟件引入一個VIP,還有一些配置和腳本,脫敏如下:--------------------------------------------------------
--節點1(192.168.1.124)keepalived.conf文件內容:
--------------------------------------------------------
[root@test04 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
vrrp_script chk_dg_stats {
script "/etc/keepalived/check_dataguard.sh"
interval 2
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state MASTER
interface eth0
mcast_src_ip 192.168.1.124
virtual_router_id 131
priority 101
inopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 888888
}
virtual_ipaddress {
192.168.1.131
}
track_script {
chk_dg_stats
}
}
--------------------------------------------------------
--節點2(192.168.1.125)keepalived.conf文件內容:
--------------------------------------------------------
[root@test05 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
vrrp_script chk_dg_stats {
script "/etc/keepalived/check_dataguard.sh"
interval 2
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
mcast_src_ip 192.168.1.125
virtual_router_id 131
priority 99
inopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 888888
}
virtual_ipaddress {
192.168.1.131
}
track_script {
chk_dg_stats
}
}
--------------------------------------------------------
--所有節點配置腳本check_dataguard.sh,並確認具有x執行權限:
--------------------------------------------------------
# cat /etc/keepalived/check_dataguard.sh
#!/bin/bash
dbstats=`ps -ef | grep ora_smon | grep -v grep | wc -l`
dgstats=`ps -ef | grep ora_mrp | grep -v grep | wc -l`
if [ "${dbstats}" -eq 0 ]; then
/etc/init.d/keepalived stop
elif [[ "${dbstats}" -gt 0 ]] && [[ "${dgstats}" -gt 0 ]]; then
/etc/init.d/keepalived stop
fi
說明:腳本check_dataguard.sh主要通過對ora_smon和ora_mrp進程的監控,判斷哪種場景下該關閉keepalived服務:
場景1:當不存在ora_smon進程時(數據庫實例Crash);
場景2:存在ora_smon進程同時存在ora_mrp進程時(已啟動mrp進程的備庫)。
--添加x執行權限:
chmod u+x /etc/keepalived/check_dataguard.sh
[root@test04 ~]# ls -l /etc/keepalived/check_dataguard.sh
-rwxr--r--. 1 root root 282 Jul 14 22:35 /etc/keepalived/check_dataguard.sh
[root@test05 ~]# ls -l /etc/keepalived/check_dataguard.sh
-rwxr--r--. 1 root root 281 Jul 14 22:36 /etc/keepalived/check_dataguard.sh
2.ADG手工切換步驟
1)在switchover正式切換前先在主庫上手工切換幾次日志,確認DG備庫同步正常:
--PRIMARY(主庫192.168.1.124)切換幾次日志:
SQL>
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
--Standby (備庫192.168.1.125)需確認同步正常沒有延遲:
SQL>
select * from v$dataguard_stats;
2)主庫切換為備庫
-- 在PRIMARY(主庫192.168.1.124)查詢,確認可切換為備庫:
select OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS from v$database;
-- 在PRIMARY(主庫192.168.1.124)操作,切換為備庫:
ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
3)備庫切換為主庫
-- 在Standby(備庫192.168.1.125)查詢,確認可切換為主庫:
select OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS from v$database;
-- 在Standby(備庫192.168.1.125)操作,切換為主庫(根據SWITCHOVER_STATUS值確認用下面哪個命令):
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
4)新主庫open,新備庫啟動並開啟MRP,新主庫啟動keepalived服務
--NEW PRIMARY(新主庫192.168.1.125)數據庫從mount啟動到open狀態:
ALTER DATABASE OPEN;
--NEW STANDBY(新備庫192.168.1.124)數據庫startup啟動,開啟DG日志應用:
STARTUP
RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
--確認NEW STANDBY(新備庫192.168.1.124)DG同步正常,沒有延遲:
SQL>
select * from v$dataguard_stats;
5) 新主庫啟動keepalived服務
--NEW PRIMARY(新主庫192.168.1.125)OS層root用戶啟動keepalived服務:
# /etc/init.d/keepalived start
注意:當演練結束后,若需要switchover主備再次切換,只需要按上面規范步驟重復操作即可(注意主備角色的轉換)。
3.VIP和監聽的關系
源於最早的一次面試,兩個節點的RAC,節點1主機Crash,此時應用通過節點1的VIP是否可以連接到數據庫?為什么? 我們都知道節點1主機Crash,其VIP會自動漂移節點2,ping這個IP也是通的,但是通過其連接數據庫卻不行!會報一個沒有監聽(ORA-12541: TNS:no listener)的錯誤。 具體可參考:RAC 某節點不可用時,對應VIP是否可用
那這里的環境,同樣是VIP的設置,為何卻可以通過VIP(192.168.1.131)連接呢?
[oracle@test03 ~]$ sqlplus sys/oracle@192.168.1.131/demo as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Jul 14 23:45:23 2020
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>
實際驗證,是因為這里主備庫的監聽配置統一都是主機名:
[oracle@test04 admin]$ cat listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = test04)(PORT = 1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
[oracle@test05 admin]$ cat listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = test05)(PORT = 1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = jingyus)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME = jingyu)
)
)
如果將主機名修改為具體的IP地址,則測試同樣會報錯(ORA-12541: TNS:no listener)。