環境:Oracle 11.2.0.4 DG
故障現象:
客戶在備庫告警日志中發現GAP sequence提示信息:
Mon Nov 21 09:53:29 2016
Media Recovery Waiting for thread 1 sequence 12034
Fetching gap sequence in thread 1, gap sequence 12034-12078
Mon Nov 21 09:55:20 2016
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 12034-12078
DBID 3493955325 branch 881855745
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that's sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
------------------------------------------------------------
修復過程:
- 1.查詢備庫SCN
- 2.確定主庫是否添加數據文件
- 3.備庫停止日志應用
- 4.主庫增量備份並傳輸到備庫上
- 5.備庫上進行恢復
- 6.主庫上創建standby controlfile文件並傳輸到備庫
- 7.備庫恢復控制文件
- 8.清空備庫日志組
- 9.備庫重設flashback
- 10.備庫重新接收並應用日志
- 11.備庫重新開啟read only模式
- 12.驗證修復是否成功
- Reference
1.查詢備庫SCN
查詢備庫當前SCN,如果人為造成控制文件、數據文件、數據文件頭的SCN不一致,那么需要根據日志中gap的起始sequence#找出對應的SCN。可以查看文末reference中惜分飛的博客評論部分。SQL> col CURRENT_SCN for 999999999999999999
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-------------------
11906842766974
2.確定主庫是否添加數據文件(這里沒有)
select FILE#,name from v$datafile where CREATION_CHANGE#> =11906842766974;
確定主庫在這個scn之后是否有添加數據文件,如果添加文件,需要手工在備庫添加。本次沒有遇到。
3.備庫停止日志應用
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
4.主庫增量備份並傳輸到備庫上
主庫進行增量備份然后傳輸到備庫上
RMAN> backup as compressed backupset INCREMENTAL from scn 11906842766974 database format '/backup/dumpfile/%u.bak';
$ scp *.bak 192.168.56.158:/oradata/rman/
5.備庫上進行恢復
RMAN> CATALOG START WITH '/oradata/rman/';
--注意如果此時庫是read only則需要置換為mount后再進行recover操作;
RMAN> RECOVER DATABASE NOREDO;
6.主庫上創建standby controlfile文件並傳輸到備庫
主庫上創建standby controlfile文件並傳輸到備庫RMAN> BACKUP CURRENT CONTROLFILE FOR STANDBY FORMAT '/home/oracle/std_ctl.bak';
[oracle@localhost ~]$ scp std_ctl.bak 192.168.56.158:/home/oracle/
7.備庫恢復控制文件
備庫關庫,啟動到nomount狀態后恢復控制文件,最后啟動到mount狀態RMAN> shutdown;
RMAN> STARTUP NOMOUNT;
RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/std_ctl.bak';
RMAN> alter database mount;
8.清空備庫日志組(這里不用)
本次DG中使用了standby log模式,不需要此步驟。SQL> ALTER DATABASE CLEAR LOGFILE GROUP 1;
如果配置了physical standby redo log則不需該步驟;
如果沒有采用standby log模式,有幾組需要清空幾組。
9.備庫重設flashback(根據實際情況選做,這里本身就沒開啟)
備庫重設flashback(根據實際情況選做,這里DG環境備庫本身就沒開啟,所以不用操作)SQL> ALTER DATABASE FLASHBACK OFF;
SQL> ALTER DATABASE FLASHBACK ON;
10.備庫重新接收並應用日志
備庫重新接收並應用日志:SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE using current logfile DISCONNECT FROM SESSION;
恢復過程備庫最后的日志(最后需要出現Media Recovery Waiting for 字樣):
Mon Nov 21 17:17:05 2016
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 32 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE using current logfile DISCONNECT FROM SESSION
Media Recovery Log /oradata/arch/1_12131_881855745.dbf
Mon Nov 21 17:18:59 2016
Media Recovery Log /oradata/arch/1_12132_881855745.dbf
Mon Nov 21 17:20:44 2016
Media Recovery Log /oradata/arch/1_12133_881855745.dbf
Mon Nov 21 17:21:02 2016
Media Recovery Log /oradata/arch/1_12134_881855745.dbf
Mon Nov 21 17:22:22 2016
Media Recovery Waiting for thread 1 sequence 12135 (in transit)
11.備庫重新開啟read only模式
根據實際情況,備庫重新開啟read only模式,本次需求是需要備庫read only狀態應用日志(11g ADG特性)SQL> alter database RECOVER MANAGED STANDBY DATABASE CANCEL;
SQL> alter database open;
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE using current logfile DISCONNECT FROM SESSION;
12.驗證修復是否成功
12.1 對比最大sequence#
不一定准確(比如中間產生過gap,但是后期的歸檔日志正常傳輸,那么實際上雖然結果相同,但是還是有gap)
在主庫中執行alter system switch logfile;
分別主備庫中執行:select max(sequence#) from v$archived_log;
12.2 通過跟蹤alert文件
主庫告警:
tail -200f /oracle/diag/rdbms/shoucall/shoucall/trace/alert_shoucall.log
備庫告警:
tail -200f /u01/app/oracle/diag/rdbms/shoucall_dg/shoucall/trace/alert_shoucall.log