1.1現象
之前有個客戶遇到一個問題,OGG同步數據鏈路,突然有一天網絡出現問題,導致OGG投遞進程無法正常投遞,無法寫入目標端的該文件。
猜測是由於網絡丟包等原因導致文件損壞,無法正常open,read,write. 解決方法,投遞進程etrollover。
本篇文檔是基於這種方式測試下etrollover 【測試沒有完美還原網絡的問題,只是對其進行了測試】
1.2測試OGG進程restart與seqno有什么關系?
1)OGG 同步表及進程參數查看
SQL> select * from dd; ID CC_NAME WITTIME ---------- ------------------------------ ------------------------------ 2 2 03-JUN-20 02.34.37.000000 PM GGSCI (t1) 4> view param exta extract exta USERID ogg,PASSWORD ogg EXTTRAIL /u01/ogg/base/dirdat/ea table YZ.DD; GGSCI (t1) 5> view param dpea extract dpea rmthost 10.0.0.32,mgrport 7809, compress rmttrail /u01/ogg/base/dirdat/t1 table YZ.B; table YZ.DD; GGSCI (t1) 7> info exta EXTRACT EXTA Last Started 2020-11-10 11:05 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:08 ago) Process ID 10744 Log Read Checkpoint Oracle Redo Logs 2020-11-10 11:25:54 Seqno 353, RBA 3917824 SCN 0.3276594 (3276594) GGSCI (t1) 8> info dpea EXTRACT DPEA Last Started 2020-11-10 11:05 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:09 ago) Process ID 10776 Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000067 2020-11-10 11:05:01.669087 RBA 1469 SQL> select * from dd; ID CC_NAME WITTIME ---------- ------------------------------ ------------------------------ 2 2 03-JUN-20 02.34.37.000000 PM GGSCI (t2) 26> view param repa replicat repa userid ogg,password ogg assumetargetdefs HANDLECOLLISIONS discardfile /u01/ogg/base/dirrpt/repa.dsc MAP YZ.DD ,TARGET BAK_YZ.DD; GGSCI (t2) 27> info repa REPLICAT REPA Last Started 2020-11-10 11:20 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:09 ago) Process ID 11023 Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000051 2020-11-10 11:05:01.313791 RBA 1563 2)目標端OGG復制進程重啟, 復制進程對應的trail 文件seq不變 GGSCI (t2) 28> stop repa GGSCI (t2) 29> start repa 3)源端OGG投遞進程重啟,投遞進程對應的trail 文件seq不變 GGSCI (t1) 9> stop dpea GGSCI (t1) 10> start dpea GGSCI (t1) 13> info dpea EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:04 ago) Process ID 11117 Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000067 First Record RBA 1469 4)源端OGG抽取進程重啟,抽取進程對應的trail 文件seq +1 GGSCI (t1) 15> info exta,detail EXTRACT EXTA Last Started 2020-11-10 11:05 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:09 ago) Process ID 10744 Log Read Checkpoint Oracle Redo Logs 2020-11-10 11:30:15 Seqno 353, RBA 3919360 SCN 0.3276690 (3276690) Target Extract Trails: Trail Name Seqno RBA Max MB Trail Type /u01/ogg/base/dirdat/ea 67 1469 20 EXTTRAIL GGSCI (t1) 16> stop exta GGSCI (t1) 17> start exta Target Extract Trails: Trail Name Seqno RBA Max MB Trail Type /u01/ogg/base/dirdat/ea 68 1469 20 EXTTRAIL 5)源端抽取進程seq +1之后,源端投遞進程讀取的文件 seq +1, 投遞進程寫入目標端seq 文件+1 ,目標端復制進程讀取的seq 文件+1 GGSCI (t1) 19> info dpea EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:08 ago) Process ID 11117 Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068 2020-11-10 11:31:58.380185 RBA 1469 GGSCI (t2) 45> info repa REPLICAT REPA Last Started 2020-11-10 11:28 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:02 ago) Process ID 11132 Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052 2020-11-10 11:31:58.035041 RBA 1563 6)源端{確認OGG鏈路處於同步狀態} SQL> insert into dd values(3,'cc',sysdate); SQL> commit; GGSCI (t1) 22> info dpea EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:00 ago) Process ID 11117 Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068 2020-11-10 11:34:52.000000 RBA 2284 目標端 SQL> select * from dd; ID CC_NAME WITTIME ---------- ------------------------------ ------------------------------ 3 cc 10-NOV-20 11.34.50.000000 AM 2 2 03-JUN-20 02.34.37.000000 PM REPLICAT REPA Last Started 2020-11-10 11:28 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:04 ago) Process ID 11132 Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052 2020-11-10 11:34:51.656002 RBA 2378
1.3模擬破壞目標端OGG應用Dump文件,如何處理
1)手工修改dump文件 [ogg@t2 ~]$ vi /u01/ogg/base/dirdat/t1000000052 破壞文件 2)源端插入1條測試數據 SQL> insert into dd values(4,'cc',sysdate); SQL> commit; 3)OGG 復制進程Abend 2020-11-10 11:36:59 ERROR OGG-02171 Error reading LCR from data source. Status 509, data source type TrailDataSource. 2020-11-10 11:36:59 ERROR OGG-02191 Incompatible record 101 in /u01/ogg/base/dirdat/t1000000052, rba 2,378 when getting trail header. 2020-11-10 11:36:59 ERROR OGG-01668 PROCESS ABENDING. 4)源端再次插入1條測試數據 SQL> insert into dd values(5,'cc',sysdate); 1 row created. SQL> commit; GGSCI (t1) 38> info dpea EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:03 ago) Process ID 11117 Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068 2020-11-10 13:25:29.000000 RBA 2604 此時,對於源端投遞進程來說,eaxxx68 這個隊列文件中,存在兩條Insert記錄; 對於目標端應用進程來說,repa t1xxx52隊列文件中,應用第一條記錄就報錯了!
投遞進程重新投遞eaxxx68隊列文件,這個文件被我們手工人為破壞了,【實際生產運維過程中,存在網絡波動包損壞等,導致源端投遞進程無法寫入文件,導致OGG同步鏈路中斷】,
原本是想模擬這個場景,但是本次模擬投遞正常,應用失敗。 GGSCI (t1) 40> info dpea EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:03 ago) Process ID 11117 Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068 2020-11-10 13:25:29.000000 RBA 2604 GGSCI (t1) 47> view param dpea extract dpea rmthost 10.0.0.32,mgrport 7809, compress rmttrail /u01/ogg/base/dirdat/t1 table YZ.DD;
5) 如何處理??? 既然是dump文件損壞,源端投遞進程重新再次投遞一個這個seqno文件不就可行? 使用etrollover前滾投遞進程!
GGSCI (t1) 55> alter EXTRACT dpea etrollover 2020-11-10 13:39:25 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format,
after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's
scan to the new trail file; it will not happen automatically. EXTRACT altered. GGSCI (t1) 48> info dpea,detail EXTRACT DPEA Initialized 2020-11-10 11:30 Status STOPPED Checkpoint Lag 00:00:00 (updated 00:01:07 ago) Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068 2020-11-10 13:25:29.000000 RBA 2604 Target Extract Trails: Trail Name Seqno RBA Max MB Trail Type /u01/ogg/base/dirdat/t1 53 0 20 RMTTRAIL Extract Source Begin End /u01/ogg/base/dirdat/ea000000068 * Initialized * 2020-11-10 13:25 /u01/ogg/base/dirdat/ea000000068 2020-11-10 11:05 2020-11-10 13:25 /u01/ogg/base/dirdat/ea000000067 2020-10-13 13:24 2020-11-10 11:05 /u01/ogg/base/dirdat/ea000000066 2020-10-13 13:24 2020-10-13 13:24 [ogg@t2 ~]$ ls -lrt /u01/ogg/base/dirdat/t1* GGSCI (t1) 49> start dpea 可以發現什么問題? OGG extract source 里面存着2個eaxxx68 seqno文件,正常情況下只會出現1條,並且 end time一致,因此相當於這個seq文件重新投遞。
6)目標端再次啟動復制進程 GGSCI (t2) 52> info repa REPLICAT REPA Last Started 2020-11-10 11:28 Status ABENDED Checkpoint Lag 00:00:00 (updated 01:58:17 ago) Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052 2020-11-10 11:34:51.656002 RBA 2378
GGSCI (t2) 58> start repa
GGSCI (t2) 59> info repa
REPLICAT REPA Last Started 2020-11-10 13:35 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:10 ago)
Process ID 12727
Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
2020-11-10 13:25:28.699520 RBA 2698
SQL> select * from dd;
ID CC_NAME WITTIME
---------- ------------------------------ ------------------------------
3 cc 10-NOV-20 11.34.50.000000 AM
2 2 03-JUN-20 02.34.37.000000 PM
4 cc 10-NOV-20 11.37.19.000000 AM
5 cc 10-NOV-20 01.25.27.000000 PM
!注意,使用前滾后,OGG目標端應用進程的文件可能會被卡住,無法正常跳轉至下一個seqno文件,本次手工人為跳過53 seqno空文件。
-rw-r-----. 1 ogg oinstall 2698 Nov 10 13:25 t1000000052
-rw-r-----. 1 ogg oinstall 0 Nov 10 13:33 t1000000053
-rw-r-----. 1 ogg oinstall 2420 Nov 11 03:09 t1000000056
GGSCI (t2) 7> alter REPLICAT repa extseqno 56 extrba 1445 -logdump open seqnoxxx隊列文件,n找到rba號