問題背景
- 適用情況:
操作系統: redhat 6.5
數據庫: oracle 11g r2
問題描述: failover后原主庫無法恢復和啟動或者丟失主備關系
- 優點
- 不需要對primary數據庫停機
- 執行簡單
- 實施前准備工作
1.測試dumplicate
2.測試環境數據庫利用dumplicate重建stanby數據庫
實施步驟
- 備份新主庫
注意備份腳本,應該備份到服務器的本地磁盤而不是帶庫。
rman_backup.sh備份本地腳本:
#!/bin/sh
#oracle environment...........
export ORACLE_BASE=/data/oracle/app
export ORACLE_HOME=$ORACLE_BASE/oracle/product/11.2.0/dbhome_1
export ORACLE_SID=orcl_stby
export PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
day=`date -u +%Y%m%d `
cd /data/bak/rman_backup
rman target / nocatalog log=/data/bak/rman_backup/rman_backup$day.log <<EOF
crosscheck archivelog all;
crosscheck backup;
delete noprompt expired archivelog all;
delete noprompt expired backup;
run{ allocate channel c1 type disk;
allocate channel c2 type disk;
backup database format '/data/bak/rman_backup/%d_full_%T%s%p.bck';
sql "alter system archive log current";
backup archivelog all format '/data/bak/rman_backup/%d_arc_%T%s%p.bck';
backup current controlfile format = '/data/bak/rman_backup/controlfile%T%s%p.bck';
release channel c1;
release channel c2;
}
exit;
EOF
- 刪除原主庫
這一步以后,后面步驟都約定改原主庫叫“備庫”,新主庫叫“主庫”。
1.關閉數據庫;
SQL>shutdown immediate;
2.以restrict方式重新打開數據庫,並啟動到mount狀態;
sqlplus / as sysdba
SQL>startup restrict mount; --> # 只有擁有sysdba角色權限的用戶才可以登錄數據庫,普通用戶則不可以(防止有其他用戶對數據庫進行訪問)
3.再次確認數據庫名,以防止誤刪除,本次要刪除的是orcl;
SQL>select name from v$database;
4.使用drop database語句;
SQL>drop database; --> # (10g及以后版本適用)
# 它只刪除了數據庫文件(控制文件、數據文件、日志文件、spfile),但並不刪除$ORACLE_BASE/admin/$ORACLE_SID目錄下的文件 也不會刪除初始化參數文件及密碼文件,歸檔日志也不會被刪掉。
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@uatecsdb ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 23 14:52:03 2017
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup restrict mount;
ORACLE instance started.
Total System Global Area 6747725824 bytes
Fixed Size 2213976 bytes
Variable Size 5100275624 bytes
Database Buffers 1610612736 bytes
Redo Buffers 34623488 bytes
Database mounted.
SQL> select name from v$database;
NAME
---------
ORCL
SQL> drop database;
Database dropped.
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> exit
[oracle@uatecsdb ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 23 14:56:20 2017
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL>
- 備庫准備startup nomount
准備pfile配置文件,最好是原來構建DataGuard時創建的的pfile。
注意把pfile改成init$ORACLE_SID.ora的格式(initorcl.ora),並且放到/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/目錄下:
SQL>startup nomount;
- rman連接主庫和備庫
執行RMAN連接前,先確認以下幾項是否有問題:
1.防火牆關閉
2.tnsnames.ora,各自服務器須能監聽對方
3.sys密碼最好一致
4.db_file_name_convert和log_file_name_convert,若目錄不一致,pfile需要制定這兩個參數
由於之前都構建過DataGuard所以,這幾項在生產環境不受影響.
rman target sys/yourpassword@orcl_stby auxiliary sys/yourpassword@orcl
使用duplicate命令重建standby數據庫
因為主備庫的路徑相同,使用下面命令:
RMAN>duplicate target database for standby from active database nofilenamecheck;
- 驗證數據庫
打開備庫:
SQL>alter database open; #這一步可能報錯,暫時不管,最后再測試是否可以open
SQL>CREATE SPFILE FROM PFILE='/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/initorcl.ora';
SQL>select status from v$instance;
SQL>select open_mode from v$database;
查看主庫:
SQL>select status from v$instance;
SQL>select open_mode from v$database;
查看GAP_STATUS
SQL>SELECT STATUS, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;
如果狀態是DEFER
SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2='ENABLE' SCOPE=BOTH;
啟動實時同步:
SQL>alter database recover managed standby database using current logfile disconnect from session;
SQL>select process,thread#,status from v$managed_standby;
SQL>SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG;
SQL>SELECT SWITCHOVER_STATUS FROM V$DATABASE;
- 恢復DMGRL關系
DGMGRL>show database verbose orcl;
查詢數據庫狀態還是Database Status:SHUTDOWN
登錄備庫,啟動dg_broker:
SQL> show parameter dg_broker_start;
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
dg_broker_start boolean FALSE
SQL> alter system set dg_broker_start = true scope=both;
System altered.
SQL>!ps -ef|grep dmon
- 遺留疑問
本次測試僅僅持續了3個多小時,導致新歸檔了15個歸檔日志,duplicat完成后,啟用LOG_ARCHIVE_DEST_STATE_2,只恢復了6個,雖然LOG各項指標檢查沒有問題,數據庫也可以open,但是數據是否會存在一致性問題?
生產環境因為一個小時一個歸檔,整個操作來說3個小時就可以完成,所以倒不用擔心日志缺失的問題。
- 生產過程正式實施新發現和解決的問題
1.生產實施的時候發現主庫log_archive_dest_2狀態是INACTIVE,應該是上回failover后沒有完整完成,所以導致主庫丟失了log_archive_dest_2
SQL> SELECT STATUS, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;
STATUS GAP_STATUS
--------- ------------------------
INACTIVE
然后執行以下SQL,補回log_archive_dest_2參數即可:
alter system set log_archive_dest_2='SERVICE=orcl LGWR SYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=orcl' scope=both;
gap狀態變為RESOLVABLE GAP,切換日志后,即變為NO GAP。
2.BROKER主備數據庫狀態配置都不對,需要重建BROKER
a.刪除原來的configuration
DISABLE FAST_START FAILOVER FORCE;
(1)觀察器上
disable configuration;
remove database orcl;
remove database orcl_stby;
remove configuration;
(2)在兩個庫上
alter system set dg_broker_start = false scope=both;
show parameter broker;
重命名/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/下的
dr1orcl_stby.dat和dr2orcl_stby.dat文件
(3)在兩個庫上
alter system set dg_broker_start = true scope=both;
b.重建configuration
DGMGRL> create configuration DG_orcl as primary database is orcl_stby connect identifier is orcl_stby;
DGMGRL> add database orcl as connect identifier is orcl maintained as physical;
DGMGRL> show database orcl_stby;
DGMGRL> show database orcl;
DGMGRL> show database verbose orcl_stby;
DGMGRL> edit database 'orcl' set property 'ArchiveLagTarget'='0';
DGMGRL> edit database 'orcl' set property 'LogArchiveMinSucceedDest'='1';
DGMGRL> edit database 'orcl_stby' set property 'DelayMins'='0';
DGMGRL> edit database 'orcl' set property 'DelayMins'='0';
DGMGRL> enable configuration;
DGMGRL> show configuration;
c.啟用FAST_START FAILOVER
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverLagLimit=1800;
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = 15;
GMGRL> EDIT DATABASE orcl_stby SET PROPERTY FastStartFailoverTarget='orcl';
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE orcl SET PROPERTY FastStartFailoverTarget='orcl_stby';
Property "faststartfailovertarget" updated
SHOW DATABASE ORCL LOGXPTMODE
SHOW DATABASE ORCL_STBY LOGXPTMODE
EDIT DATABASE ORCL SET PROPERTY LOGXPTMODE='SYNC';
EDIT DATABASE ORCL_STBY SET PROPERTY LOGXPTMODE='SYNC';
EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
ENABLE FAST_START FAILOVER;
SHOW FAST_START FAILOVER;
SHOW CONFIGURATION VERBOSE;