Oracle19C ADG主備切換問題處理


--原文:

https://www.modb.pro/db/40938?xzs=

大家好!最近遇到一起ADG主備切換后實時應用開啟失敗的故障,導致故障的原因很簡單,關鍵的還是分析思路,具體情況請聽筆者一一道來。

 

環境:

操作系統:LINUX

數據庫版本:19.7

是否RAC:是

當前主庫:一中心,切換后一中心變成備庫

當前備庫:二中心,切換后一中心變成主庫

 

我們同時做了十多套ADG的切換測試,只有這一套庫切換出現異常。主庫第一次從一中心切換到二中心后,一中心切換到STANDBY角色並startup后,打開實時應用顯示成功執行:

 

SQL>startup

ORACLEinstance started.

 

TotalSystem Global Area 3.8762E+11 bytes

FixedSize                 30951712 bytes

VariableSize            1.1254E+11 bytes

DatabaseBuffers         2.7488E+11 bytes

RedoBuffers              170373120 bytes

Databasemounted.

Databaseopened.

SQL>

SQL>alter database recover managed standby database using current logfiledisconnect from session;

 

Databasealtered.

 

SQL>

SQL>

SQL>set line 300 pagesize 5000

SQL>selectname,db_unique_name,open_mode,database_role,SWITCHOVER_STATUS,protection_mode,flashback_onfrom gv$database;

 

NAME     DB_UNIQUE_NAME                 OPEN_MODE           DATABASE_ROLE    SWITCHOVER_STATUS    PROTECTION_MODE     FLASHBACK_ON

--------------------------------------- -------------------- ------------------------------------ -------------------- ------------------

RACDB RACDB                       READ ONLY WITH APPLY PHYSICAL STANDBYNOT ALLOWED          MAXIMUM PERFORMANCE  NO

RACDB RACDB                       READ ONLY WITH APPLY PHYSICAL STANDBYNOT ALLOWED          MAXIMUM PERFORMANCE  NO

 

此時顯示備庫正常。

 

1、查看新備庫dbalert日志確認切換成功:

 

2020-09-01T15:43:16.093670+08:00

NET (PID:25084): Switchover complete. Database shutdown required

USER(ospid: 25084): terminating the instance

2020-09-01T15:43:16.764327+08:00

ORA-1092: opitsk aborting process

2020-09-01T15:43:17.112672+08:00

Instanceterminated by USER, pid = 25084

2020-09-01T15:43:17.208529+08:00

ORA-1092: opitsk aborting process

TMI:dbsdrv switchover to target END 2020-09-01 15:43:17.239806

Completed:alter database switchover to racdbstd

Shuttingdown ORACLE instance (abort) (OS id: 25084)

Shutdownis initiated by sqlplus@racdb1 (TNS V1-V3).

Licensehigh water mark = 42

2020-09-01T15:43:17.272253+08:00

Warning:2 processes are still attacheded to shmid 1802268:

(size:1019904 bytes, creator pid: 143048, last attach/detach pid: 25084)

Instanceshutdown complete (OS id: 25084)

 

2、然后同事在新主庫創建數據文件測試是否同步至備庫時,查看alert日志發現實時應用失敗,於是決定使用重啟大法,重新打開新備庫時報錯:

racdb1:/home/oracle(racdb1)$sqlplus/ as sysdba

 

SQL*Plus:Release 19.0.0.0.0 - Production on Tue Sep 1 17:35:35 2020

Version19.7.0.0.0

 

Copyright(c) 1982, 2020, Oracle.  All rights reserved.

 

Connectedto an idle instance.

 

SQL>startup

ORACLEinstance started.

 

TotalSystem Global Area 3.8762E+11 bytes

FixedSize                 30951712 bytes

VariableSize            1.1254E+11 bytes

DatabaseBuffers         2.7488E+11 bytes

RedoBuffers              170373120 bytes

Databasemounted.

ORA-10458:standby database requires recovery

ORA-01196:file 1 is inconsistent due to a failed media recovery session

ORA-01110:data file 1: '+DATADG1/RACDB/DATAFILE/system.260.1048091377'

 

3、此時筆者介入,通過了解情況發現是在新主庫添加數據文件進行測試之后出現的異常。於是查看添加數據文件時間點的dbalert日志發現控制文件自動快照備份失敗,數據文件同步失敗導致實時應用終止。

 

2020-09-01T16:17:26.492303+08:00

Controlfile backup creation failed:

failureto open backup target file/oracle/app/oracle/product/19.0.0/db/dbs/snapcf_racdb2.f.

2020-09-01T16:17:26.493708+08:00

Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_92022.trc:

ORA-27037:unable to obtain file status

Linux-x86_64Error: 2: No such file or directory

Additionalinformation: 7

2020-09-01T16:22:05.961404+08:00

Controlautobackup written to DISK device

 

handle'/oracle/app/oracle/product/19.0.0/db/dbs/c-2886113901-20200901-00'

 

2020-09-01T16:24:57.733483+08:00

PR00(PID:92022): MRP0: Background Media Recovery terminated with error1193

2020-09-01T16:24:57.733624+08:00

Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_92022.trc:

ORA-01193:file 26 is not the same file seen at start of recovery

ORA-01110:data file 26:'+DATADG1/RACDB/AC86ED673411EB76E0534C36E60A63A8/DATAFILE/tbs_daoshu_data.334.1048267329'

2020-09-01T16:24:57.735016+08:00

....(PID:113683): Managed Standby Recovery not using Real Time Apply

2020-09-01T16:24:57.920384+08:00

Recoveryinterrupted!

 

4、繼續查看再次重啟時的dbalert日志,發現報ORA-600錯誤:

2020-09-01T16:49:10.804524+08:00

Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_43860.trc (incident=1153647) (PDBNAME=CDB$ROOT):

ORA-00600:internal error code, arguments: [krdrsb_broadcast_influx_scn_1],[15254371943880], [15254373230828], [], [], [], [], [], [], [], [],[]

(3):Incidentdetails in:/oraclelog/diag/rdbms/racdb/racdb1/incident/incdir_1153647/racdb1_pr00_43860_i1153647.trc

(3):UseADRCI or Support Workbench to package the incident.

SeeNote 411.1 at My Oracle Support for error and packaging details.

2020-09-01T16:49:12.527738+08:00

Errorswith log+ARCHIVEDG/RACDB/ARCHIVELOG/2020_09_01/thread_2_seq_55.414.1049990539

PR00(PID:43860): MRP0: Background Media Recovery terminated with error600

2020-09-01T16:49:12.527955+08:00

Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_43860.trc:

ORA-00600:internal error code, arguments: [krdrsb_broadcast_influx_scn_1],[15254371943880], [15254373230828], [], [], [], [], [], [], [], [],[]

2020-09-01T16:49:12.529640+08:00

....(PID:113683): Managed Standby Recovery not using Real Time Apply

2020-09-01T16:49:12.659315+08:00

Recoveryinterrupted!

 

5、查看RMAN備份策略發現控制文件自動快照備份是本地目錄RMAN>show all;using target database control file instead ofrecovery catalogRMAN configuration parameters for database withdb_unique_name RACDBSTD are:CONFIGURE RETENTION POLICY TOREDUNDANCY 1; # defaultCONFIGURE BACKUP OPTIMIZATION OFF; #defaultCONFIGURE DEFAULT DEVICE TYPE TO DISK; #defaultCONFIGURE CONTROLFILE AUTOBACKUP ON; # defaultCONFIGURECONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; #defaultCONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TOBACKUPSET; # defaultCONFIGURE DATAFILE BACKUP COPIES FOR DEVICETYPE DISK TO 1; # defaultCONFIGURE ARCHIVELOG BACKUP COPIES FORDEVICE TYPE DISK TO 1; # defaultCONFIGURE MAXSETSIZE TOUNLIMITED; # defaultCONFIGURE ENCRYPTION FOR DATABASE OFF; #defaultCONFIGURE ENCRYPTION ALGORITHM 'AES128'; #defaultCONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # defaultCONFIGURE RMANOUTPUT TO KEEP FOR 7 DAYS; # defaultCONFIGURE ARCHIVELOGDELETION POLICY TO APPLIED ON ALL STANDBY;CONFIGURESNAPSHOT CONTROLFILE NAME TO'/oracle/app/oracle/product/19.0.0/db/dbs/snapcf_racdb1.f'; #default

 

原因分析:

是由於切換過程中控制文件有更新,恰好此時oracle觸發了一次控制文件自動快照備份,由於快照備份路徑設置在本地目錄,自動備份失敗,導致MRP0進程馬上終止,介質恢復失敗,從而主備不同步。

 

解決方法:

修改RMAN配置參數CONFIGURESNAPSHOT CONTROLFILENAME,將控制文件SNAPSHOT放在共享存儲上。目前為了防止其他ADG發生類似的問題,已將所有ADG的控制文件SNAPSHOT存放到共享存儲。

RMAN>CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+DATADG1/snapcf_racdb1.f';

 

疑問:

其他數據庫的CONFIGURESNAPSHOT CONTROLFILE NAME也沒有配置到共享存儲,建表空間測試步驟都是相同的,如果說是因為CONFIGURESNAPSHOT CONTROLFILE NAME沒有配置到共享存儲引起,為什么只有這套數據庫出現不同步的問題,其他數據庫都沒有出現?目前從日志來看只有這套庫當時觸發了控制文件自動快照備份,而其他庫沒有觸發,所以正常。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM