OGG故障錯誤處理總結


OGG故障錯誤處理總結



 

 

http://blog.itpub.net/26655292/viewspace-2142867/ 

第一章 GoldenGate錯誤分析與處理

在維護GoldenGate過程中,由於各種意外情況,難免還是會遇到各種各樣的問題。掌握一些常見的GoldenGate故障診斷和錯誤分析的方法是非常有必要的而且掌握這些錯誤分析工具也進一步加深對GoldenGate產品的認識與對GoldenGate原理的理解。

 

1.1   GoldenGate常見異常處理

GoldenGate運行起來后,隨着時間的推移可能會碰到各種各樣的問題,下面就來介紹常見的異常現象以及常見的異常處理方法。

1.1.1  異常處理的一般步驟

首先確定是GoldenGate的哪類進程有故障(是抽取,投遞還是復制進程有問題),解決故障的一般思路如下。

1通過GGSCI>view report命令查找ERROR字樣,確定錯誤原因並根據其信息進行排除

2通過GGSCI>view ggsevt查看告警日志信息

3檢查兩端數據庫是否正常運行,網絡是否連通

4通過logdump工具對隊列文件進行分析。

1.1.2  RAC單節點失敗

RAC環境下,GoldenGate軟件安裝在共享目錄下可以通過任一個節點連接到共享目錄,啟動GoldenGate運行界面。如果其中一個節點失敗,導致GoldenGate進程中止,可直接切換到另外一個節點繼續運行。

操作步驟如下。

1Oracle用戶登錄源系統(使用另外一個正常的節點)

2確認將GoldenGate安裝所在文件系統裝載到另一節點相同目錄

3確認GoldenGate安裝目錄屬於Oracle用戶及其所在組

4確認Oracle用戶及其所在組對GoldenGate安裝目錄擁有讀寫權限

5進入GoldenGate安裝目錄

6執行./ggsci進入命令行界面

7執行start mgr啟動MGR

8執行start er *啟動所有進程

檢查各進程是否正常啟動,即可進入正常復制。

1.1.3  Extract常見異常

以下為列舉的一些常見錯誤信息作參考用

Extract進程包括抽取與投遞進程,投遞進程報錯大部分原因是由於網絡故障。對於源數據庫,抽取進程ext**如果變為abended,則可以通過在GGSCI中使用view report命令看報告,可以通過搜索ERROR快速定位錯誤。

一般情況下,抽取異常的原因是因為其無法找到對應的歸檔日志,可以通過到歸檔日志目錄命令行下執行

示例9-1:

 

ls –lt arch_x_xxxx.arc

 

查看該日志是否存在,如不存在則可能的原因如下。

“ 日志已經被壓縮

“ GoldenGate無法自動解壓縮,需要人工解壓縮后才能讀取

“ 日志已經被刪除

如果日志已經被刪除,需要進行恢復才能繼續復制。

一般需要定期備份歸檔日志,並清除舊的歸檔日志。需要保證歸檔日志在歸檔目錄中保留足夠長時間之后,才能被備份和清除。即定期備份清除若干小時之前的歸檔,而不是全部歸檔。保留時間計算如下

某歸檔文件保留時間?抽取進程處理完該文件中所有日志所需的時間

可以通過命令行或者GoldenGate Director Web界面,運行info extxx showch命令查看抓取進程ext處理到哪條日志序列號。在此序列號之前的歸檔,都可以被安全的清除。

抽取進程在抽取不支持的數據對象時也會abendreport文件會有詳細的報錯信息,根據report文件來定位錯誤信息然后再排錯即可。

下面再單獨列出更多的幾個故障

1Extract: Application failded to initializeWin)。

錯誤信息:run GGSCI command but the Alert window report "Application failded to initialize(0xc000026e)"

GoldenGateWindows平台上需要安裝Microsoft Visual C ++ 2005 SP1 Redistributable Package。如果是Microsoft Itanium平台需要安裝vcredist_IA64.exe

Windows 2008需以下額外操作:右擊‘cmd’ (DOS)選擇‘run as administrator’,然后在該命令行窗口中啟動MGRExtract才能夠讀取數據庫日志

OGG安裝為服務時即運行“install ADDSERVICE”),需要使用管理員權限這樣啟動服務后即能訪問日志。

通過以下方法為運行MGRExtract的用戶添加讀取日志文件的權限右鍵擊文件->property->security->edit->add。

2Extract: Cannot load program./ggsci…

錯誤分析:請首先檢查該OGG Build是否與操作系統和數據庫相符;其次如果是Aix請檢查xLC版本是否符合10.0以上。

另外,檢查環境變量中動態庫路徑是否包含了數據庫動態庫目錄,例如:

示例9-2:

 

export LD_LIBRARY_PATH=$ORACLE_HOME/lib

 

不同平台下的環境變量不同

“ AIX  LIBPATH。

“ SolarisLinux  LD_LIBRARY_PATH。

“ HP-Unix  SHLIB_PATH。

重設環境變量需重啟MgrExt/Rep進程。

3Extract: Block size mismatch (8192/512)…

裸設備的偏移量各操作系統默認為0,但AIX默認為4096。當創建裸設備時使用了-TO選項時,Oracle不會跳過4096字節而是直接從0開始讀寫。 因此在AIX下使用裸設備時,出現此錯誤需要指定OGG從偏移量0開始讀取

示例9-3

 

tranlogoptions rawdeviceoffset 0

 

該參數其在實際環境中使用幾率非常高在以前版本中如果缺少此參數Extract立即終止,但新版本Extract會持續進行嘗試,並不自動終止,需檢查報告文件

4Extract: ORA-15000 ASM connection error

該錯誤為OCI錯誤,表示Extract是在連接數據庫時出現問題,根據錯誤信息判斷為權限問題。

首先在Extract參數中檢查ASM相關參數tranlogoptions asmuser sys@+ASM1,asmpassword oracle,再檢查tnsnames.oralistener.ora驗證ASM實例配置是否正確,確認ASM用戶具有SYSDBA 權限;如果使用SYS,需要將ASM實例的init.oraREMOTE_LOGIN_PASSWORDFILE參數設置為SHARED(多個數據庫可以使用一個password文件,只有SYS用戶可以遠程登錄)

使用sqlplus驗證:

示例9-4

 

sqlplus sys/oracle@asm1 as sysdba //可以登錄

sqlplus sys/oracle@asm1;           //報告15000錯誤

 

5Extract: Encountered SCN That Is Not Greater Than The Highest SCN Already Processed…

原因分析:在Oracle RAC環境中,Extract會啟動一個coordinator線程對各個節點上的操作進行根據SCN進行排序,它在交易提交后會等待THREADOPTIONS MAXCOMMITPROPAGATIONDELAY參數所定義時間來確認空閑節點沒有交易,然后再收集交易數據;寫入該交易后如果空閑節點后來又讀到了一個SCN號要小的交易,則會報告該錯誤

可能原因:

“ 各節點之間沒有配置時鍾同步

“ 一個節點比另外一個節點慢(IO問題可能性較大

解決辦法:

調整Extract參數:

示例9-5

 

THREADOPTIONS MAXCOMMITPROPAGATIONDELAY <msec> IOLATENCY <msec>

 

MAXCOMMITPROPAGATIONDELAY有效范圍是0-90000ms,默認為3s(即3000ms)。

GGS V9.x多了一個IOLATENCY參數,可以與上面參數一起加大等待時間。IOLATENCY默認為1.5s,最大值為180000

建議出現該錯誤后可以將此二參數設置為較大值,然后逐步降低獲取最佳設置

需要補充說明的是,出現此錯誤后,因后面的交易可能已被寫入日志,重啟Extract可成功啟動,但是可能出現如下問題:Extract會重寫當前隊列覆蓋前面的交易數據,后面的Data Pump進程可能會出現“abend with incompatible record errors”錯誤終止(舊版本可能出現)。

此問題的恢復步驟如下。

① 停止所有Data PumpReplicat,針對所有的Extract記錄其Write Checkpoint的隊列Seqno

② 對於每個Extract向下滾動一個隊列

示例9-6

 

ALTER EXTRACT [name], ETROLLOVER

 

啟動Extract查看是否滾動到了下一個隊列,記錄其新隊列seqno,應當是舊隊列號+1

③ 修改Data Pump從新的隊列開始傳輸

示例9-7

 

ALTER EXTRACT [pump_name], EXTSEQNO ##### EXTRBA 0

 

重啟Data Pump查看是否能夠重啟成功並從新的隊列傳輸

④ 修改Replicat參數文件,加入或者打開HANDLECOLLISIONS,如果有GROUPTRANSOPSMAXTRANSOPS請注釋掉,啟動Replicat,觀察其是否能夠讀取新傳輸過來的隊列如Replicat無法自動滾動到下一個隊列,需要通過如下命令手工滾動:

示例9-8

 

alter replicat [replicat_name], EXTSEQNO ##### EXTRBA 0

 

等待Replicat處理到結尾沒有延遲時,可以關閉HANDLECOLLISIONS和恢復原來的GROUPTRANSOPSMAXTRANSOPS參數

⑤ 重新啟動Replicat即可恢復正常復制。

1.1.4  網絡故障

如果MGR進程參數文件里面設置了autorestart參數,GoldenGate可以自動重啟,無需人工干預。

當網絡不穩定或者發生中斷時, GoldenGate負責產生遠地隊列的Pump進程會自動停止。 此時,MGR進程會定期根據mgr.prm里面autorestart設置自動啟動Pump進程以試探網絡是否恢復。在網絡恢復后,負責產生遠程隊列的Pump進程會被重新啟動,GoldenGate的檢查點機制可以保證進程繼續從上次中止復制的日志位置繼續復制。

需要注意的是,因為源端的抽取進程(Capture)仍然在不斷抓取日志並寫入本地隊列文件,但是Pump進程不能及時把本地隊列搬動到遠地,所以本地隊列文件無法被自動清除而堆積下來需要保證足夠容量的存儲空間來存儲堆積的隊列文件計算公式如下

存儲容量?單位時間產生的隊列大小×網絡故障恢復時間

MGR定期啟動抓取和復制進程參數配置參考:

示例9-9

 

GGSCI > edit param mgr

port 7809

autorestart er *,waitminutes 3,retries 5,RESETMINUTES 60

 

3分鍾重試一次,5次重試失敗以后等待60分鍾,然后重新試三次。

1.1.5  Replicat進程常見異常

對於目標數據庫,投遞進程repXX如果變為abended,則可以通過在GGSCI中使用view report命令看報告,可以通過搜索ERROR快速定位錯誤。

復制進程的錯誤通常為目標數據庫錯誤,比如:

“ 數據庫臨時停機

“ 目標表空間存儲空間不夠

“ 目標表出現不一致。

可以根據報告查看錯誤原因,排除后重新啟動rep進程即可。

需要注意一點:往往容易忽略UNDO表空間。如果DML語句中包含了大量的UPDATEDELETE操作,則目標端UNDO的生成速度會很快,有可能填滿UNDO表空間。

典型錯誤數據復制典型錯誤如下:

示例9-10

 

- SQL error 1403 mapping  2010-02-25 13:20:08  GGS WARNING     218  Oracle GoldenGate Delivery for Oracle, rep_stnd.prm:  SQL error 1403 mapping HR.MY_EMPLOYEE to HR.MY_EMPLOYEE.

 

可能原因包括以下幾個方面。

“ 兩端結構不一致(異構環境,列和主鍵不同

“ 兩端有不一致記錄

“ 附加日志不全

可以到discard文件中查看具體錯誤信息,如果為UPDATE或者DELETE找不到對應記錄,並且某幾個字段為空,則可認定為缺少了附加日志

 

1.2   使用reperror進行錯誤處理

對於Replicat進程處理DML操作過程中報錯時,GoldenGate提供了一個參數用來控制如何處理Replicat進程的報錯。這就是本節內容要介紹的reperror參數。這個參數能控制大部分的GoldenGate錯誤處理的手段。

如某案例的Replicat進程參數如圖9-1所示。

 

9-1

1.2.1    reperror處理類型與含義

ReperrorGoldenGate11版本中共提供了7類處理錯誤方式分別如下。

1abendReplicat遇到不能處理的記錄時,回滾事務,然后停止處理,Replicat進程狀態轉為abend

2discard:將不能處理記錄的錯誤信息放到discard文件而Replicat進程繼續處理下面的記錄

3exception:將錯誤按照預先定義好的方式處理

4ignore:將不能處理的記錄忽略掉,然后繼續處理下面的記錄

5retryop [maxretries <n>]:遇到不能處理的記錄時,重試n

6transabort [,maxretries <n>][, delay[c]sesc<n>];終止事務處理,將rba號指到該事務的開頭,也可以指定重試幾次

7reset:清除掉所有的reperror規則,然后將reperror的規則默認為abend

Replicat進程的參數中,可以將任意一個處理類型置為默認,如reperror、default、abend

通常,為了保證數據的一致性,都將reperror的默認規則設置為abend

1.2.2    復制進程常見數據庫錯誤類型與處理方法

在實際的GoldenGate系統中,很大一部分Replicat錯誤信息都類似於ORA開頭的數據庫錯誤(這里以Oracle數據庫為例)。雖然,通常對於ORA錯誤,需要手動查找數據庫的原因,但可以用reperror處理一些預知的錯誤類型,然后再在數據庫層面找到錯誤的原因,手動排除,而不至於導致該進程處理其他正常的表而abend掉。

例如:可以忽略掉重復數據的插入而其類型的報錯則abend

示例9-11

 

Reperror (default, abend)

Reperror (-1, ignore)

 

當然,也可以只針對某張表的忽略掉重復數據的插入而abend掉其類型的報錯。

示例9-12

 

REPERROR (-1, IGNORE)

MAP sales.product, TARGET sales.product;

REPERROR RESET

MAP sales.account, TARGET sales.account;

 

最常見的錯誤為ORA-1403。

1403錯誤是指記錄無法投遞到目標庫,純屬數據錯誤,要通過查看錯誤信息和discard文件,到兩端庫尋找相應記錄,結合logdump分析隊列中的實際數據,再分析出問題原因。可能存在的原因有:兩端表結構不一致;附加日志錯誤;初始化方法錯誤導致不一致;目標端級聯刪除、trigger沒有被禁止;目標端存在Oraclejob或者操作系統任務修改數據。

處理方法:

“ 重新初始化該表

“ 手工修復該條數據

“ 修改reperror參數為discardignore模式,忽略掉錯誤(在使用這個參數之前用戶應該非常清楚自己在做什么,因為它會導致兩端數據不一致)。

1.3  Ddlerror處理DDL復制錯誤

GoldenGate打開了DDL復制時,當DDL復制報錯時,則需要用到此處的ddlerror參數預處理一些常見的報錯信息。Ddlerror對於抽取復制進程均有效,默認為abend

Ddlerror參數的語法為:

示例9-13

 

DDLERROR

{<error> | DEFAULT} {<response>}

[RETRYOP MAXRETRIES <n> [RETRYDELAY <delay>]]

{INCLUDE <inclusion clause> | EXCLUDE <exclusion clause>}

[IGNOREMISSINGTABLES | ABENDONMISSINGTABLES]

 

如當DDL復制報ORA-1430錯誤,傳遞了重復的alter語句導致,則可以用ddlerror (1430, discard)將錯誤信息扔到discard文件里。

的錯誤處理與reperror類似。

1.4  Discardfile記錄進程錯誤信息

discardfile 這個參數來生成一個discard文件,將GoldenGate不能處理的信息記錄到這個文件。這樣對GoldenGatetroubleshooting非常的有幫助。

如源端表結構有變化,默認傳遞過來的數據應用時Replicat進程則報錯,此時則可以通過discard文件看到報錯信息位哪個表做了怎樣的alter操作,再在目標端也將表結構改變一些,錯誤即可排除。

Discard文件默認在GoldenGate安裝目錄的dirrpt子文件夾,如圖9-2所示

 

9-2

Discard文件記錄的報錯信息如圖9-3所示

 

9-3

1.5  GoldenGate常見錯誤分析

1解決GoldenGate錯誤的一個關鍵點就是通過錯誤分析工具(包括report文件,ggserr.log discard文件logdump工具,GGSCI命令行)確定錯誤的根源是哪個組件引起的

“ 系統或者網絡?

“ 數據庫報錯或者應用報錯?

“ GoldenGate安裝報錯?

“ GoldenGate的某個進程報錯?

“ GoldenGate的參數配置文件報錯?

“ SQL語句或者存儲過程報錯?

然后再確定錯誤的原因,逐個排查。

2GoldenGate遇到錯誤時,則可以借助日志、report文件找到錯誤原因,一步一步來排查。一般的錯誤信息GoldenGate都會提示有相應的解決辦法。

如下介紹一個錯誤案例:

通過命令:

示例9-14

 

GGSCI>view ggsevt

看到的報錯信息如圖9-4所示。

 

9-4

通過view report dpeyb 看到的也是類似的信息。

再來觀察容災端復制進程的報錯信息為:

示例9-15

 

2011-03-02 12:03:37  ERROR   OGG-01028  Incompatible record in ./dirdat/ yb018262, rba 72955479 (getting header).

 

通過logdump進入到該trail文件查看,如圖9-5所示。

 

9-5

通過分析推敲等,確認是因為trail文件有一條記錄已損壞,導致投遞進程不識別,不能自動翻滾到下一個trail文件,而復制進程也不能自動應用到下一個trail文件,Pump程通過手動etrollover,復制進程通過alter手動指定到下一個trail文件應用,故障即可排除。

1.5.1  AIX GGSCI無法運行

錯誤信息:

示例9-16

 

Cannot load ICU resource bundle 'ggMessage', error code 2 - No such file or directory

Cannot load ICU resource bundle 'ggMessage', error code 2 - No such file or directory

IOT/Abort trap (core dumped)

或者GGSCI可以啟動,但是運行任何命令都報上面的錯誤。

處理方法:通常使用已有的mount點安裝GoldenGate,在mount時使用了並發CIO參數。新建文件系統,重新mount,作為GoldenGate安裝目錄。

錯誤信息:

示例9-17

 

$ ./ggsci

exec(): 0509-036 Cannot load program GGSCI because of the following errors:

         0509-130 Symbol resolution failed for GGSCI because:

         0509-136   Symbol _GetCatName__FiPCc (number 158) is not exported          from dependent module /usr/lib/libC.a[ansi_64.o].

         0509-136   Symbol _Getnumpunct__FPCc (number 162) is not exported          from dependent module /usr/lib/libC.a[ansi_64.o].

         0509-136   Symbol __ct__Q2_3std8_LocinfoFPCci (number 183) is not          exported from dependent module /usr/lib/libC.a[ansi_64.o].

         0509-192 Examine .loader section symbols with the 'dump -Tv' command.

 

原因是XLC6.0版本,升級XLC版本到10.1以上,問題即可解決

1.5.2  HP-UX GGSCI無法運行

錯誤信息:core dumped

該問題只在HP-UX11.31上發現。

處理方法:環境變量沒有設置正確。

1.5.3  隊列文件保存天數

mgr.prm中,添加:

示例9-29

 

PURGEOLDEXTRACTS ./dirdat/*,usecheckpoints, minkeepdays 3

 

修改之后,必須重啟manager即可看到隊列文件占用的空間被按照上面指定的規則釋放。

如果存儲空間不夠,可以將minkeepdays修改為MINKEEPHOURS

如果源端存儲空間不足,最好修改最少保留的時間。

9.5.12 復制進程拆分及指定隊列文件及RBA

拆分前通過INFO XXX獲取隊列文件信息及RBA號,返回樣例如下:

示例9-30

 

GGSCI> INFO REPYXA

REPLICAT   REPYXA    Last Started 2011-01-08 19:48   Status RUNNING

Checkpoint Lag       00:00:00 (updated 00:01:42 ago)

Log Read Checkpoint  File ./dirdat/p1000556 First Record  RBA 59193235

 

在將Replicat進程拆分后,指定從拆分前的隊列文件及RBA號碼開始復制:

示例9-31

 

ALTER REPLICAT xxx EXTSEQNO nnn, EXTRBA mmm

 

以上面的為例:

示例9-32

 

ALTER REPLICAT REPYXA 556, EXTRBA 59193235

1.5.4  BOUNDED RECOVERY

錯誤信息:

示例9-33

 

BOUNDED RECOVERY: reset to initial or altered checkpoint.

 

數據庫問題,不能讀取第2個節點的archivelog文件

1.5.5  排除不復制的表

在參數文件中增加:

示例9-34

 

TABLEEXCLUDE schema.table_name

1.5.6  從指定時間重新抓取

重新抓取數據前提:歸檔文件沒有刪除

示例9-35

 

ALTER EXTRACT xxx, TRANLOG, BEGIN 2010-12-31 08:00

 

時間格式:yyyy-mm-dd [hh:mi:[ss[.cccccc]]]

如果是新建:

示例9-36

 

ADD EXTRACT xxx, TRANLOG, BEGIN 2010-12-31 08:00

1.5.7  進程無法停止

通常情況是在處理大交易,尤其在有超過2小時以上的大交易,建議等待進程處理完畢。

處理方法:如果必須停止進程,可以強制殺死進程

示例9-37

 

send xxx forcestop

1.5.8  CLOB處理

如果包含CLOB字段,在Extract參數文件中必須添加:

示例9-38

 

TRANLOGOPTIONS CONVERTUCS2CLOBS

1.5.9  DB2不能使用checkpoint table

處理方法:在增加Replicat進程時使用nodbcheckpoint參數

示例9-39

 

add replicat xxx, exttrail /GoldenGate/dirdat/rb, nodbcheckpoint

1.6  ogg-錯誤

1.6.1  OGG-00446  

1.6.1.1  OGG-00446 Could not find archived log for sequence 53586 thread 1 under alternative destinations.

錯誤信息:

OGG-00446  Could not find archived log for sequence 53586 thread 1 under alternative destinations. SQL <SELECT MAX(sequence#)  FROM v$log WHERE thread# = :ora_thread>. Last alternative log tried /arch_cx/1_53586_776148274.arc., error retri

eving redo file name for sequence 53586, archived = 1, use_alternate = 0Not able to establish initial position for sequence 53586, rba

 44286992.

處理辦法:

將缺失的歸檔日志從備份中恢復出來。如果依舊找不到所需歸檔日志,那么只能重新實施數據初始化。

 

今天啟動一個extract時,出現以下錯誤:

 

2011-10-16 22:41:02  ERROR   OGG-00446  Oracle GoldenGate Capture for Oracle, e430rks2.prm:  Could not find archived log for sequence 10770 thread 1 under default destinations SQL <SELECT  name    FROM v$archived_log   WHERE sequence# = :ora_seq_no AND         thread# = :ora_thread AND         resetlogs_id = :ora_resetlog_id AND         archived = 'YES' AND         deleted = 'NO>, error retrieving redo file name for sequence 10770, archived = 1, use_alternate = 0Not able to establish initial position for sequence 10770, rba 78960656.

2011-10-16 22:41:02  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, e430rks2.prm:  PROCESS ABENDING.

 

原因是extract 所需的archived log已經被清走,不在log_archive_dest指定的目錄下,解決方法很簡單,只要把sequence 10770開始到當前的archived log重新拷貝回log_archive_dest目錄下即可。

 

 第一種辦法,會導致數據不一致,改變抽取進程的時間如下執行:


GGSCI (HP-HP) 8> alter extract extl,begin now


第二種辦法:重新初始化

重新初始化過程如下:

---- source

SQL> col current_scn format 999999999999999

SQL>  Select current_scn from v$database;

 

     CURRENT_SCN

----------------

  12242466771468

   

expdp XPADB/XPADB directory=DMP dumpfile=xpadb_20160125_01.dmp LOGFILE=xpadb_20160125.log  TABLES=BASE_ACTIONPOWER,BASE_BANK  FLASHBACK_SCN=12242466771468

 

 

--- target

impdp XPADRPT/xpadrpt DIRECTORY=OGGD DUMPFILE=xpadb_20160125_01.dmp LOGFILE=impdp.xpadb_20160125_01.log REMAP_SCHEMA=xpadb:xpadrpt REMAP_TABLESPACE=xpaddat:xpaddata  table_exists_action=replace

 

start replicat ggsrep , aftercsn 12242466771468

 

1.6.1.2  OGG-00446 No valid log files for current redo sequence

 goldengate oracle asm 中增量同步數據時,出現下述錯誤.

 ERROR   OGG-00446  No valid log files for current redo sequence 367, thread 1, error retrieving redo file name

for sequence 367, archived = 0, use_alternate = 0Not able to establish initial position for begin time 2013-03-27 15:32:46.

 ERROR   OGG-01668  PROCESS ABENDING.

在抽取進程的參數文件中加入TRANLOGOPTIONS DBLOGREADER即可。

參考:Extract fail due to an ASM connection configuration issue [ID 1061093.1]

Applies to:

Oracle GoldenGate - Version 11.1.1.0.0 and later

Information in this document applies to any platform.

Goal

To show how to recover from an extract failure when your Archive or Redo files are stored under ASM

and you see one of the following messages


ERROR 118 No Valid Log File For Current Redo Sequence Xxxx, Thread Y


ERROR 500 No valid log files for current redo sequence X, thread Y, error retrieving redo file name for sequence X, archived = 0, use_alternate = 0 Not able to establish initial position for begin time YYYY-MM-DD HH:MI:SS

ERROR OGG-00446  error 2 (No such file or directory) opening redo log <log file name>.dbf for sequence ####

Not able to establish initial position for begin time YYYY-MM-DD HH:MI:SS

Fix

If you are running Oracle ASM, the problem may be that the ASM connection is either not defined or is incorrectly defined or TRANSLOGOPTINS DBLOGREADER needs to be added.   If your archive files are ONLY under ASM and extract receives an error 500, extract may have run successfully until the process needed to read from the ARCHIVES instead of the REDO. Once it needs to read from archive, the extract will fail.

Please Add the following line, or correct it in your Extract parameter file, if you are On Oracle 11.2.0.2 or better,  or 10.2.0.5 or better and using OGG 11.x 

TRANLOGOPTIONS DBLOGREADER

If the above version  of Oracle or OGG doesn't apply to you specifying a user that can connect to the ASM instance and restart your Extract:


TRANLOGOPTIONS ASMUSER <user>@<ASM_instance_name>,

ASMPASSWORD <password>

 

 

 

1.6.1.3  OGG-00446 Missing filename opening checkpoint file.

 

ERROR   OGG-00446  Missing filename opening checkpoint file.

進程RSJQZ011進程abended,如下:

ERROR   OGG-00446  Missing filename opening checkpoint file.

檢查RSJQZ011配置情況:

GGSCI (oraserver.localdomain) 19> view param RSJQZ011

Sourcedefs  /goldengate/dirdef/DESJQZ001.def

---handlecollisions

batchsql

SETENV ( NLS_LANG = ".ZHS16GBK")

OBEY /goldengate/dirprm/pwd.obey

Discardfile /goldengate/dirrpt/RSJZX001.dsc, append, megabytes 100

map DB_DJGL.A, target DB_NBGY.A;

發現Replicat RSJQZ011一行被刪除了,所以導致報錯。

加上Replicat RSJQZ011后進程啟動正常。

 

 

1.6.2  OGG-01154  Oracle GoldenGate Delivery for Oracle, repn.prm

 錯誤信息:

OGG-01154  Oracle GoldenGate Delivery for Oracle, repn.prm:  SQL error 1691 mapping DATA_USER.DMH_WJXXB to DATA_USER.DMH_WJXXB OCI Error ORA-01691: unable to extend lob segment DATA_USER.SYS_LOB0000083691C00014$$ by 16384 in tablespace DATA_USER_LOB_U128M_1 (status = 1691), SQL <INSERT INTO "DATA_USER"."DMH_WJXXB" ("DMH_WJXXB_ID","DMH_ZLXXB_ID","DMH_GPXXB_ID","DMH_PCXXB_ID","PICIH","SHENQINGH","FID","WENJIANZL","WENJIANLXDM","WENJIANMC","DTDBBH","FAMINGMC","FUTUGS","WENJIANST>.

處理辦法:

數據庫中該表空間已滿,需要對該表空間進行擴容。

 

1.6.2.1  OGG-01154

錯誤信息:2011-03-29 15:53:57  WARNING OGG-01154  Oracle GoldenGate Delivery for Oracle, repya.prm:  SQL error 14402 mapping EPMA.D_METER to E

PMA.D_METER OCI Error ORA-14402: updating partition key column would cause a partition change (status = 14402), SQL <UPDATE "EPMA"."D_METER" SET "PR_ORG" = :a1,"BELONG_DEPT" = :a2 WHERE "METER_ID" = :b0>.

導致原因:源端更新了分區列,但目標端沒有打開行移動,導致更新時報錯;

處理方法:SQLPLUS>alter table SCHEMA.TABLENAME enable row movement;

 

1.6.3  OGG-00664

1.6.3.1  OGG-00664  OCI Error during OCIServerAttach (status = 12541-ORA-12541: TNS:no listener).

錯誤信息:

OGG-00664  OCI Error during OCIServerAttach (status = 12541-ORA-12541: TNS:no listener).

處理方法:

啟動數據庫的監聽器。

 

1.6.3.2  OGG-00664  OCI Error during OCIServerAttach (status = 12545-Error while trying to retrieve text for error ORA-12545).

 

2015-06-09 22:31:11  ERROR   OGG-00664  OCI Error during OCIServerAttach (status = 12545-Error while trying to retrieve text for error ORA-12545).

 

2015-06-09 22:31:16  ERROR   OGG-01668  PROCESS ABENDING.

 

ORACLE_HOME設置有問題。

解決辦法:setenv (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)

 

1.6.4  OGG-00665

1.6.4.1  OGG-00665 OCI Error describe for query (status = 3135-ORA-03135: connection lost contact

錯誤信息:

OGG-00665  OCI Error describe for query (status = 3135-ORA-03135: connection lost contact

Process ID: 8859

Session ID: 131 Serial number: 31), SQL<SELECT DECODE(archived, 'YES', 1, 0),       status  FROM v$log WHERE thread# = :ora_thread AND       sequence# = :ora_seq_no>.

處理方法:

在沒有關閉OGG進程的情況下,提前關閉了數據庫,導致OGG進程出現異常。如果是發現了這個錯誤提示,應該馬上關閉OGG進程,注意數據庫的歸檔日志情況,保證歸檔日志不會缺失,然后等待數據庫啟動成功后,馬上啟動OGG進程。

 

1.6.4.2  OGG-00665 OCI Error describe for query

Applies to:

Oracle GoldenGate - Version: 11.1.1.0.7 and later   [Release: 11.1.1 and later ]

Information in this document applies to any platform.

Symptoms

When attempting to start an Extract, we get error

2010-12-09 18:59:25 GGS ERROR 182 OCI Error describe for query (bad syntax) (status = 942-ORA-00942: table or view does not exist), SQL< select value$ from sys.props$ where name = 'NLS_LANGUAGE'>.


2010-12-09 18:59:25 GGS ERROR 190 PROCESS ABENDING.

 

Cause

The database user does not have the necessary privilege.

Solution

Grant the necessary privilege to the Golden Gate user.


SQL> grant select on sys.props$ to ggsuser;

or

SQL> grant select any dictionary to ggsuser;

 

1.6.4.3  OGG-00665  OCI Error describe for query (status = 942-ORA-00942: table or view does not exist), SQL<SELECT 1 FROM DUAL  WHERE EXISTS (          SELECT 'x' FROM ggusr.GGS_DDL_HIST WHERE OPTIME < '2015-05-25 11:12:43')>.

 

2015-06-08 12:12:43  ERROR   OGG-00665  OCI Error describe for query (status = 942-ORA-00942: table or view does not exist), SQL<SELECT 1 FROM DUAL  WHERE EXISTS (          SELECT 'x' FROM ggusr.GGS_DDL_HIST WHERE OPTIME < '2015-05-25 11:12:43')>.

 

2015-06-08 12:12:43  ERROR   OGG-01668  PROCESS ABENDING.

 

如果想使用DDL功能,需要在之前運行支持DDL的相關腳本。

1.@marker_setup.sql

2.@ddl_setup.sql     mode of installation:initialsetup

3.@role_setup.sql

4.GRANT GGS_GGSUSER_ROLE TO gguser

5.@ddl_enable.sql

 

 

 

1.6.5  OGG-01161  Bad column index (4) specified for table QQQ.TIANSHI, max columns = 4.

 錯誤信息:

OGG-01161  Bad column index (4) specified for table QQQ.TIANSHI, max columns = 4.

處理方法:

對照一下生產端與容災端的這一張表的表結構,如果容災端的表缺少一列,則在容災端,登陸數據庫,增加這一列,然后啟動復制進程。

 

1.6.6  OGG-00199  Table QQQ.T0417 does not exist in target database.

錯誤信息:

ERROR   OGG-00199  Table QQQ.T0417 does not exist in target database.

處理方法:

查看源端抽取進程的參數,DDL復制參數是否配置,針對這張表,重新實施數據初始化。

 

 

 

 

1.6.7  OGG-01738 BOUNDED RECOVERY  

 database version:11.2.0.3 RAC

goldengate version :11.1.1.1.2

早上發現數據同步異常,source端狀態如下:

GGSCI (ulecardrac1) 3> info all

 

Program     Status      Group       Lag           Time Since Chkpt

 

MANAGER     RUNNING                                           

EXTRACT     RUNNING     EXT232      00:00:00      06:32:33    

EXTRACT     RUNNING     PUMP232     00:00:00      00:00:03    

status還是為RUNNING,但是已經有六個半小時沒有update了,其實該進程已經hang住

查看告警日志ggserr.log

發現存在OGG-01738提示

2013-03-07 02:42:28  INFO    OGG-01738  Oracle GoldenGate Capture for Oracle, ext232.prm:  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p5905_Redo Thread 1: start=SeqNo: 679, RBA: 83280912, SCN: 1.913813052 (5208780348), Timestamp: 2013-03-06 22:00:20.000000, end=SeqNo: 679, RBA: 129051136, SCN: 1.938808049 (5233775345), Timestamp: 2013-03-07 02:42:03.000000.

2013-03-07 02:42:28  INFO    OGG-01738  Oracle GoldenGate Capture for Oracle, ext232.prm:  BOUNDED RECOVERY: CHECKPOINT: for object pool 2: p5905_Redo Thread 2: start=SeqNo: 692, RBA: 103611920, SCN: 1.913812238 (5208779534), Timestamp: 2013-03-06 22:00:16.000000, end=SeqNo: 693, RBA: 93604864, SCN: 1.938808100 (5233775396), Timestamp: 2013-03-07 02:42:15.000000.

 

MOS上有一篇關於該錯誤的文章 note 1293772.1

國內大牛劉相兵的博客上也有一篇關於該錯誤的說明:

http://www.askmaclean.com/archives/ogg-01738-bounded-recovery.html

The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:

GGSCI> start <extract_name> BRRESET

 

因為extract進程ext232已經假死,無法stop掉,甚至用'send ext232 forcestop'和'stop mgr'也無法stop掉該extract進程

最后只能在shell下kill掉進程,再重新執行

GGSCI> start ext232 BRRESET

 

重新啟動后,發現狀態已經正常,同步已經基本無延遲。

bug只在RAC中或者單實例設置了多個thread的情況下出現,而且在更高級版本中已經修復,為了一勞永逸,可以考慮將ogg升級至11.2.1.0.1

2012-10-20 10:28:02  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 79286800, SCN: 0.3712874 (3712874), Timesta

mp: 2012-10-19 22:27:45.000000, Thread: 1, end=SeqNo: 343, RBA: 79287296, SCN: 0.3712874 (3712874), Timestamp: 2012-10-19 22:27:45.000000, Thread: 1.


2012-10-20 14:28:05  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 107000336, SCN: 0.3725744 (3725744), Timest

amp: 2012-10-20 02:27:14.000000, Thread: 1, end=SeqNo: 343, RBA: 107000832, SCN: 0.3725744 (3725744), Timestamp: 2012-10-20 02:27:14.000000, Thread: 1.


2012-10-20 18:28:06  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 128054288, SCN: 0.3739371 (3739371), Timest

amp: 2012-10-20 06:28:02.000000, Thread: 1, end=SeqNo: 343, RBA: 128054784, SCN: 0.3739371 (3739371), Timestamp: 2012-10-20 06:28:02.000000, Thread: 1.


2012-10-20 22:28:06  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 153368080, SCN: 0.3752583 (3752583), Timest

amp: 2012-10-20 10:27:46.000000, Thread: 1, end=SeqNo: 343, RBA: 153368576, SCN: 0.3752583 (3752583), Timestamp: 2012-10-20 10:27:46.000000, Thread: 1.


2012-10-21 02:28:08  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 165712912, SCN: 0.3763760 (3763760), Timest

amp: 2012-10-20 14:28:00.000000, Thread: 1, end=SeqNo: 343, RBA: 165713408, SCN: 0.3763760 (3763760), Timestamp: 2012-10-20 14:28:00.000000, Thread: 1.


2012-10-21 06:28:15  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 179789328, SCN: 0.3774866 (3774866), Timest


...skipping one line

2012-10-21 10:28:16  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 201859088, SCN: 0.3788193 (3788193), Timest

amp: 2012-10-20 22:26:32.000000, Thread: 1, end=SeqNo: 343, RBA: 201859584, SCN: 0.3788193 (3788193), Timestamp: 2012-10-20 22:26:32.000000, Thread: 1.


2012-10-21 14:28:26  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 246480912, SCN: 0.3803284 (3803284), Timest

amp: 2012-10-21 02:27:31.000000, Thread: 1, end=SeqNo: 343, RBA: 246481408, SCN: 0.3803284 (3803284), Timestamp: 2012-10-21 02:27:31.000000, Thread: 1.


2012-10-21 18:28:33  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 291493392, SCN: 0.3821051 (3821051), Timest

amp: 2012-10-21 06:28:22.000000, Thread: 1, end=SeqNo: 343, RBA: 291493888, SCN: 0.3821051 (3821051), Timestamp: 2012-10-21 06:28:22.000000, Thread: 1.

 

 

 

Oracle GoldenGate版本11.x中,引入了Bounded Recovery(BR)的概念,即允許extract對於長事務(long running transaction 比BRINTERVAL指定值更長的事務)寫入到本地BR目錄。當extract重啟時,它會首先讀取BR文件,取而代之讀取恢復檢查點指定的歸檔日志,這樣有助於提升性能以及減少對舊歸檔文件的依賴。

但是當在RAC環境中使用Bounded Recovery(BR)特性來恢復一個異常abend掉的extract的話,小概率可能會遇到extract hang住或丟失特性的事務。該BUG僅在RAC環境中或者單實例情況下使用多個thread設置時出現。

1. bug 10368242: transaction loss with BR

When a transaction is committed, it will be flushed to trail file. But when BR writing started (after the transaction commit) and extract abends abnormally, the extract may not have chance to flush the committed transaction to trail. When extract restarted, it will read from BR, and leave that committed transaction as persist committed transaction in memory and never be written to trail. So this committed transaction may be lost.

The problem will not happen when the extract stops in normal mode.

2.bug 12532428 (base bug 10408077 ): extract hung when using BR and new objects are added to extract

With BR setup, when new objects (table, sequence, DDL, et al) are including in the extract, restarted extract will pick up more data that causes the producer queue limit (a fixed number) used by BR be reached. Because the extract is still in BR recovery, the consumer thread is stopped and not processing data from the producer queues. This caused a deadlock, and the extract will appear hung.

解決方案

1. 對於BUG 12532428引起的事務丟失,該BUG在11.1.1.1中被修復,且會在11.1.1.0中被backport。

2. 對於BUG 10408077 引起的extract hang,該BUG在11.1.1.1和 11.1.1.0.30中被修復,也可以如下workaround繞過:

A workaround with earlier 11.1.1.0 version is to start extract with BRRESET, when new object is added to an extract. All the archived logs since recovery checkpoint need to be available.

 

ggsci> start extract, BRRESET

 

When running Oracle Golden Gate 11.1.1.0.6 or higher, extract is “abending” every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.

Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.

ERROR

———

2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: ggcrc64valid.

2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: br_validate_bcp.

2011-02-06 05:15:38 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p7186_Redo Thread 1.

2011-02-06 05:15:38 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 0.0.0.

2011-02-06 09:15:46 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p7186_Redo Thread 1: start=SeqNo: 21659, RBA: 117520912, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000, end=SeqNo: 21659, RBA: 117602816, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000.

Cause

Under these conditions, this may be a problem with the Bounded Recovery Checkpoint file. It is likely corrupted.

Solution

The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:

GGSCI> start <extract> BRRESET

 

 

BOUNDED RECOVERY

錯誤信息:BOUNDED RECOVERY: reset to initial or altered checkpoint.

數據庫問題,不能讀取第2個節點的archivelog文件

 

 

1.6.8  OGG-00268 OGG-01668參數文件格式問題 

   

  OGG-00268 OGG-01668參數文件格式問題

現象:

Start ext1報錯:

2012-04-23 04:17:21  ERROR   OGG-00268  Parameter unterminated.

2012-04-23 04:17:21  ERROR   OGG-01668  PROCESS ABENDING.

原因:

GoldenGate對語法要求非常嚴格,比如逗號,分號,空格等

處理方法:

在參數文件最后加一個分號“;”

 

1.6.9  WARNING OGG-00959 (MINKEEPFILES option not used.). 

mgr.rpt 里面找到這個警告WARNING OGG-00959 (MINKEEPFILES option not used.).

-- Purge old trail-files

刪除老文件

PURGEOLDEXTRACTS /ggs/tdmInput/m1/g3*, USECHECKPOINTS, MINKEEPHOURS 12

2012-10-30 15:15:09 WARNING OGG-00959 PURGEOLDEXTRACTS /ggs/tdmInput/m1/g3*, USECHECKPOINTS, MINKEEPHOURS 12 (MINKEEPFILES option not used.).

The descripton for this warning is:

// *Cause:  The PURGEOLDEXTRACTS parameter contains the option MINKEEPHOURS or

// MINKEEPDAYS with the option MINKEEPFILES. These are mutually

// exclusive. If either MINKEEPHOURS or MINKEEPDAYS is used with

// MINKEEPFILES, then MINKEEPHOURS or MINKEEPDAYS is accepted, and

// MINKEEPFILES is ignored.

// *Action: Remove MINKEEPFILES (or MINKEEPHOURS depending on your

// requirements.

告警描述:

原因:PURGEOLDEXTRACTS 參數包含了MINKEEPHOURS 或者MINKEEPDAYS 參數並且包含MINKEEPFILES參數 ,他們之間是相互沖突的。

如果MINKEEPHOURS ,MINKEEPDAYS ,MINKEEPFILES 同時使用那么系統接受MINKEEPHOURS和MINKEEPDAYS 參數將對MINKEEPFILES 參數做忽略。

1.6.10  OGG-00303  Did not recognize parameter argument.

參數變量配置不正確

問題描述:

ERROR   OGG-00303  Did not recognize parameter argument.

問題分析:

進程參數文件配置不正確。

問題處理:

檢查參數配置文件,可能是進程名稱與配置文件不一致或者是參數不正確,重啟進程。

1.6.11  OGG-01044

2015-06-08 17:54:45  ERROR   OGG-01044  The trail './dirdat/aa' is not assigned to extract 'EORA_T1'. Assign the trail to the extract with the command "ADD EXTTRAIL/RMTTRAIL ./dirdat/aa, EXTRACT EORA_T1".

解決辦法:需要添加trail文件

GGSCI (orcltest) 11> add exttrail ./dirdat/aa,extract eora_t1,megabytes 100

EXTTRAIL added.

 

1.6.12  OGG-00396

 

2015-01-07 11:39:38  ERROR   OGG-00396  Command not terminated by semi-colon.

 

 

2015-01-07 11:39:38  ERROR   OGG-01668  PROCESS ABENDING.

 

 

原因是配置文件中沒有以分號結尾;

解決辦法:修改配置文件。

1.6.13  OGG-01031  goldengate源端意外宕機,導致OGG-01031報錯

示例9-21

 

ERROR   OGG-01031  There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Expected 4 bytes, but got 0 bytes, in trail ./dirdat/t1000026, seqno 26, reading record trailer token at RBA 103637218).

2011-01-06 11:04:16  ERROR   OGG-01668  PROCESS ABENDING.

 

處理方法:

可能是目標端的trail file出問題了,前滾重新生成一個新的SEND EXTRACT xxx ROLLOVER,或者“alter extract xxx rollover”。

 

 

服務器宕機,沒有停止dpump進程,啟動后處於abend狀態,檢查ggserr.log報以下錯誤:

 

2011-04-01 11:13:19  ERROR   OGG-01031  Oracle GoldenGate Capture for Oracle, dpump.prm:  There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/vistor/media/GG/dirdat/rt000003" (error 11, Resource temporarily unavailable)).

 

原因是由於目標端的OGG代碼正在更新,同時dpump進程沒有停止導致dpump進程始終尋找老的manager端口和源端的trail文件。

 

解決方法重新啟動expddumpextmananger進程,若還是報錯就需要更改參數。


dpump添加 ETROLLOVER屬性,產生一個新的文件點


alter extract ext1 etrollover


start extract dpump


info extract dpump


標記源端trail文件sequence number開啟生成新的rt文件


send replicat rep1logend


alter replicat rep1,extseqno 4, extrba 0


start replicat rep1


進程啟動恢復正常。

 

source端:

GGSCI (orcltest) 31> info all

 

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

MANAGER     RUNNING                                           

EXTRACT     RUNNING     EORA_HR     00:00:00      00:00:07    

EXTRACT     ABENDED     PORA_HR     00:00:00      40:04:19    

REPLICAT    RUNNING     RORA_HR2    00:00:00      00:00:00    

REPLICAT    STOPPED     TESTRPT     00:00:00      00:05:48    

 

 

GGSCI (orcltest) 32> view report    PORA_HR

 

 

***********************************************************************

                 Oracle GoldenGate Capture for Oracle

    Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO

   Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:42:16

 

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

 

 

                    Starting at 2015-06-12 10:36:28

***********************************************************************

 

Operating System Version:

Linux

Version #1 SMP Sun Nov 10 22:19:54 EST 2013, Release 2.6.32-431.el6.x86_64

Node: orcltest

Machine: x86_64

                         soft limit   hard limit

Address Space Size   :    unlimited    unlimited

Heap Size            :    unlimited    unlimited

File Size            :    unlimited    unlimited

CPU Time             :    unlimited    unlimited

 

Process id: 14523

 

Description:

 

***********************************************************************

**            Running with the following parameters                  **

***********************************************************************

 

2015-06-12 10:36:28  INFO    OGG-03035  Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:.

extract pora_hr

setenv (ORACLE_SID=ogg1)

Set environment variable (ORACLE_SID=ogg1)

setenv (ORACLE_HOME=/u02/app/oracle/product/11.2.0/dbhome_1)

Set environment variable (ORACLE_HOME=/u02/app/oracle/product/11.2.0/dbhome_1)

setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)

Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)

passthru

rmthost 192.168.59.130,mgrport 7809

rmttrail ./dirdat/pa

table hr.*;

 

2015-06-12 10:36:28  INFO    OGG-01815  Virtual Memory Facilities for: COM

    anon alloc: mmap(MAP_ANON)  anon free: munmap

    file alloc: mmap(MAP_SHARED)  file free: munmap

    target directories:

    /u01/gg11/dirtmp.

 

CACHEMGR virtual memory values (may have been adjusted)

CACHESIZE:                               64G

CACHEPAGEOUTSIZE (normal):                8M

PROCESS VM AVAIL FROM OS (min):         128G

CACHESIZEMAX (strict force to disk):     96G

 

2015-06-12 10:36:33  INFO    OGG-01226  Socket buffer size set to 27985 (flush size 27985).

 

Source Context :

  SourceModule            : [er.extrout]

  SourceID                : [/scratch/aime1/adestore/views/aime1_adc4150256/oggcore/OpenSys/src/app/er/extrout.c]

  SourceFunction          : [complete_tcp_msg]

  SourceLine              : [1522]

  ThreadBacktrace         : [9] elements

                          : [/u01/gg11/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7f5c2f9bd06e]]

                          : [/u01/gg11/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x2cc) [0x7f5c2f9b944c]]

                          : [/u01/gg11/libgglog.so(_MSG_ERR_ER_REMOTE_COMM_PROBLEM(CSourceContext*, char const*, CMessageFactory::MessageDisposition)+0x31) [0x7f5c2f9a11e9]]

                          : [/u01/gg11/extract(complete_tcp_msg(extract_def*)+0x424) [0x51313c]]

                          : [/u01/gg11/extract(flush_tcp(extract_def*, int)+0x20d) [0x5139f1]]

                          : [/u01/gg11/extract(RECOVERY_initialize()+0x371) [0x524f91]]

                          : [/u01/gg11/extract(main+0x4a5) [0x56ca65]]

                          : [/lib64/libc.so.6(__libc_start_main+0xfd) [0x3a5221ed1d]]

                          : [/u01/gg11/extract(__gxx_personality_v0+0x38a) [0x4e8b7a]]

 

2015-06-12 10:36:43  ERROR   OGG-01031  There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply rece

ived is Unable to open file "./dirdat/pa000002" (error 11, Resource temporarily unavailable)).

 

2015-06-12 10:36:43  ERROR   OGG-01668  PROCESS ABENDING.

 

 

 

GGSCI (orcltest) 34> info all

 

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

MANAGER     RUNNING                                           

EXTRACT     RUNNING     EORA_HR     00:00:00      00:00:05    

EXTRACT     ABENDED     PORA_HR     00:00:00      40:05:10    

REPLICAT    RUNNING     RORA_HR2    00:00:00      00:00:10    

REPLICAT    STOPPED     TESTRPT     00:00:00      00:06:39    

 

 

GGSCI (orcltest) 35>  alter extract pora_hr etrollover

 

2015-06-12 10:38:15  INFO    OGG-01520  Rollover performed.  For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file;  it will not happen automatically.

EXTRACT altered.

 

 

GGSCI (orcltest) 36> view params PORA_HR

 

extract pora_hr  

setenv (ORACLE_SID=ogg1)

setenv (ORACLE_HOME=/u02/app/oracle/product/11.2.0/dbhome_1)

setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)   

passthru  

rmthost 192.168.59.130,mgrport 7809  

rmttrail ./dirdat/pa  

table hr.*;

 

 

GGSCI (orcltest) 37> start extract PORA_HR

 

Sending START request to MANAGER ...

EXTRACT PORA_HR starting

 

 

GGSCI (orcltest) 38> info all

 

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

MANAGER     RUNNING                                           

EXTRACT     RUNNING     EORA_HR     00:00:00      00:00:06    

EXTRACT     RUNNING     PORA_HR     00:00:00      00:00:49    

REPLICAT    RUNNING     RORA_HR2    00:00:00      00:00:01    

REPLICAT    STOPPED     TESTRPT     00:00:00      00:07:42    

 

 

target端:

GGSCI (rhel6_lhr) 30> view report RORA_HR

 

 

***********************************************************************

                 Oracle GoldenGate Delivery for Oracle

    Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO

   Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:48:07

 

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

 

 

                    Starting at 2015-06-10 04:48:15

***********************************************************************

 

Operating System Version:

Linux

Version #1 SMP Tue Apr 21 08:37:59 PDT 2015, Release 2.6.32-504.16.2.el6.x86_64

Node: rhel6_lhr

Machine: x86_64

                         soft limit   hard limit

Address Space Size   :    unlimited    unlimited

Heap Size            :    unlimited    unlimited

File Size            :    unlimited    unlimited

CPU Time             :    unlimited    unlimited

 

Process id: 40019

 

Description:

 

***********************************************************************

**            Running with the following parameters                  **

***********************************************************************

 

2015-06-10 04:48:15  INFO    OGG-03035  Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:.

replicat rora_hr

setenv (ORACLE_SID=ogg2)

Set environment variable (ORACLE_SID=ogg2)

setenv (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)

Set environment variable (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)

setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)

Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)

ddl include all

ddlerror default ignore retryop maxretries 3 retrydelay 5

userid ggusr,password ***

handlecollisions

assumetargetdefs

discardfile ./dirrpt/rora_hr.dsc,purge

map hr.* ,target hr.*;

 

2015-06-10 04:48:15  INFO    OGG-01815  Virtual Memory Facilities for: COM

    anon alloc: mmap(MAP_ANON)  anon free: munmap

    file alloc: mmap(MAP_SHARED)  file free: munmap

    target directories:

    /u01/gg11/dirtmp.

 

CACHEMGR virtual memory values (may have been adjusted)

CACHESIZE:                                2G

CACHEPAGEOUTSIZE (normal):                8M

PROCESS VM AVAIL FROM OS (min):           4G

CACHESIZEMAX (strict force to disk):   3.41G

 

Database Version:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

PL/SQL Release 11.2.0.3.0 - Production

CORE    11.2.0.3.0      Production

TNS for Linux: Version 11.2.0.3.0 - Production

NLSRTL Version 11.2.0.3.0 - Production

 

Database Language and Character Set:

NLS_LANG         = "AMERICAN_AMERICA.ZHS16GBK"

NLS_LANGUAGE     = "AMERICAN"

NLS_TERRITORY    = "AMERICA"

NLS_CHARACTERSET = "ZHS16GBK"

 

***********************************************************************

**                     Run Time Messages                             **

***********************************************************************

 

Opened trail file ./dirdat/pa000002 at 2015-06-10 04:48:15

 

2015-06-10 04:48:19  WARNING OGG-01519  Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe

rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.

 

2015-06-10 04:48:29  WARNING OGG-01519  Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe

rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.

 

2015-06-10 04:48:50  WARNING OGG-01519  Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe

rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.

 

2015-06-10 04:49:30  WARNING OGG-01519  Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe

rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.

 

2015-06-10 04:50:50  WARNING OGG-01519  Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe

rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.

 

2015-06-10 04:53:30  WARNING OGG-01519  Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe

rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.

 

2015-06-10 04:54:21  INFO    OGG-01021  Command received from GGSCI: STATS.

 

 

GGSCI (rhel6_lhr) 31> info all

 

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

MANAGER     RUNNING                                           

EXTRACT     RUNNING     EORA_HR2    00:00:00      00:00:04    

EXTRACT     RUNNING     PORA_HR2    00:00:00      00:00:04    

REPLICAT    RUNNING     RORA_HR     00:00:00      00:00:10    

 

 

GGSCI (rhel6_lhr) 32> send replicat RORA_HR,logend

ERROR: No Command for SEND.

 

GGSCI (rhel6_lhr) 33> alter replicat RORA_HR,extseqno 3, extrba 0

 

ERROR: REPLICAT RORA_HR is running and cannot be altered (1,2,No such file or directory).

 

 

GGSCI (rhel6_lhr) 34>

GGSCI (rhel6_lhr) 34> stop RORA_HR

 

Sending STOP request to REPLICAT RORA_HR ...

Request processed.

 

 

GGSCI (rhel6_lhr) 35> alter replicat RORA_HR,extseqno 3, extrba 0

REPLICAT altered.

 

 

GGSCI (rhel6_lhr) 36> start  RORA_HR

 

Sending START request to MANAGER ...

REPLICAT RORA_HR starting

 

 

GGSCI (rhel6_lhr) 37> info all

 

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

MANAGER     RUNNING                                           

EXTRACT     RUNNING     EORA_HR2    00:00:00      00:00:08    

EXTRACT     RUNNING     PORA_HR2    00:00:00      00:00:05    

REPLICAT    RUNNING     RORA_HR     00:05:33      00:00:03    

 

 

GGSCI (rhel6_lhr) 38> view report  RORA_HR

 

 

***********************************************************************

                 Oracle GoldenGate Delivery for Oracle

    Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO

   Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:48:07

 

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

 

 

                    Starting at 2015-06-10 05:01:13

***********************************************************************

 

Operating System Version:

Linux

Version #1 SMP Tue Apr 21 08:37:59 PDT 2015, Release 2.6.32-504.16.2.el6.x86_64

Node: rhel6_lhr

Machine: x86_64

                         soft limit   hard limit

Address Space Size   :    unlimited    unlimited

Heap Size            :    unlimited    unlimited

File Size            :    unlimited    unlimited

CPU Time             :    unlimited    unlimited

 

Process id: 40703

 

Description:

 

***********************************************************************

**            Running with the following parameters                  **

***********************************************************************

 

2015-06-10 05:01:13  INFO    OGG-03035  Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:.

replicat rora_hr

setenv (ORACLE_SID=ogg2)

Set environment variable (ORACLE_SID=ogg2)

setenv (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)

Set environment variable (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)

setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)

Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)

ddl include all

ddlerror default ignore retryop maxretries 3 retrydelay 5

userid ggusr,password ***

handlecollisions

assumetargetdefs

discardfile ./dirrpt/rora_hr.dsc,purge

map hr.* ,target hr.*;

 

2015-06-10 05:01:13  INFO    OGG-01815  Virtual Memory Facilities for: COM

    anon alloc: mmap(MAP_ANON)  anon free: munmap

    file alloc: mmap(MAP_SHARED)  file free: munmap

    target directories:

    /u01/gg11/dirtmp.

 

CACHEMGR virtual memory values (may have been adjusted)

CACHESIZE:                                2G

CACHEPAGEOUTSIZE (normal):                8M

PROCESS VM AVAIL FROM OS (min):           4G

CACHESIZEMAX (strict force to disk):   3.41G

 

Database Version:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

PL/SQL Release 11.2.0.3.0 - Production

CORE    11.2.0.3.0      Production

TNS for Linux: Version 11.2.0.3.0 - Production

NLSRTL Version 11.2.0.3.0 - Production

 

Database Language and Character Set:

NLS_LANG         = "AMERICAN_AMERICA.ZHS16GBK"

NLS_LANGUAGE     = "AMERICAN"

NLS_TERRITORY    = "AMERICA"

NLS_CHARACTERSET = "ZHS16GBK"

 

***********************************************************************

**                     Run Time Messages                             **

***********************************************************************

 

Opened trail file ./dirdat/pa000003 at 2015-06-10 05:01:13

 

2015-06-10 05:01:13  INFO    OGG-01020  Processed extract process RESTART_ABEND record at seq 3, rba 1046 (aborted 0 records).

 

Switching to next trail file ./dirdat/pa000004 at 2015-06-10 05:01:13 due to EOF, with current RBA 1108

Opened trail file ./dirdat/pa000004 at 2015-06-10 05:01:13

 

Processed extract process graceful restart record at seq 4, rba 1074.

Processed extract process graceful restart record at seq 4, rba 1136.

 

2015-06-10 05:01:13  INFO    OGG-01407  Setting current schema for DDL operation to [hr].

 

2015-06-10 05:01:13  INFO    OGG-01408  Restoring current schema for DDL operation to [ggusr].

 

 

GGSCI (rhel6_lhr) 39>

 

 

1.6.13.1  OGG-01031

啟動源端傳輸進程DPEND,ggserr.log錯誤顯示如下:

2012-08-28 15:09:39  ERROR   OGG-01031  Oracle GoldenGate Capture for Oracle, dpend.prm:  There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/uo1/app/ogg/dirdat/nd000004" (error 2, No such file or directory)).

2012-08-28 15:09:41  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, dpend.prm:  PROCESS ABENDING.目標端ggserr.log錯誤顯示如下:

2012-08-28 15:06:30  WARNING OGG-01223  Oracle GoldenGate Collector for Oracle:  Unable to lock file "/uo1/app/ogg/dirdat/nd000004" (error 11, Resource temporarily unavailable).  Lock currently held by process id (PID) 13854.

2012-08-28 15:06:30  WARNING OGG-01223  Oracle GoldenGate Collector for Oracle:  Unable to open file "/uo1/app/ogg/dirdat/nd000004" (error 2, No such file or directory).

導致原因:可能是網絡出現過故障,OGG源端的Data Pump進程與目標斷了聯系,目標端mgr為其啟動的server進程一直還在運行,下次data pump重啟時目標mgr會試圖生成另外一個server進程,這樣兩個進程會爭同一個隊列文件。

處理方法:

1、停掉源端的所有data pump,使用ps ef|grep server(或OGG安裝目錄)看看是不是還有OGGserver進程在跑,如果有,殺死它(一定要確認源端data pump全停掉,並且殺的是server進程,不要殺其它extract/replicat/mgr等),重啟源端data pump即可。

 

2、可能是目標端的trail file出問題了,前滾重新生成一個新的隊列文件

SEND EXTRACT xxx ETROLLOVER

或者:alter extract xxx etrollover

xxxdatapump的名稱

 

1.6.14  OGG-01296

示例9-18

 

ERROR   OGG-01296  Oracle GoldenGate Delivery for Oracle, yx_rep3.prm:   Error mapping from SGPM.A_PAY_FLOW to SGPM.A_PAY_FLOW.

 

由於源端進行了表結構更改,沒有通知目標端,導致此錯誤。

處理方法:在目標端執行相應的語句,將表結構修改為和源端一致。

1.6.15  OGG-01088

錯誤信息:

示例9-19

 

ERROR   OGG-01088  Oracle GoldenGate Delivery for Oracle, pms_rep1.prm:   malloc 2097152 bytes failed.

ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, pms_rep1.prm:   PROCESS ABENDING.

處理方法:

1)“ulimit –a”,驗證操作系統對用戶是否所有資源都是無限制

2將進程進行拆分,拆分為多個進程

3support.oracle.com下載最新的補丁包,升級GoldenGate

 

1.6.16  OGG-01223

啟動源端傳輸進程DPEND,ggserr.log錯誤顯示如下:

2012-08-17 11:43:50  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, dpend.prm:  TCP/IP error 79 (Connection refused).

2012-08-17 11:45:01  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, dpend.prm:  TCP/IP error 79 (Connection refused).

導致原因:因為目標端110MGR進程沒有啟動,導致報錯

處理方法:

在目標端啟動start mgr啟動進程后,再啟動源端的傳輸進程DPEND,錯誤消失,文件順利傳輸過來了。

 

正常的日志如下:

2012-08-17 14:31:51  INFO    OGG-00993  Oracle GoldenGate Capture for Oracle, dpend.prm:  EXTRACT DPEND started.

2012-08-17 14:33:13  INFO    OGG-01226  Oracle GoldenGate Capture for Oracle, dpend.prm:  Socket buffer size set to 27985 (flush size 27985).

2012-08-17 14:33:26  INFO    OGG-01052  Oracle GoldenGate Capture for Oracle, dpend.prm:  No recovery is required for target file F:\ogg\dirdat\nd000000, at RBA 0 (file not opened).

2012-08-17 14:33:26  INFO    OGG-01478  Oracle GoldenGate Capture for Oracle, dpend.prm:  Output file F:\ogg\dirdat\nd is using format RELEASE 11.2.

 

 

1.6.17  OGG-01224

示例9-20

 

ERROR OGG-01224 Oracle GoldenGate Manager for Oracle, mgr.prm: No buffer space available

 

處理方法:

修改mgr.prm,擴大動態端口范圍,dynamicportlist 7840-7914

1.6.17.1  OGG-01224

啟動源端傳輸進程DPEND,ggserr.log錯誤顯示如下:

2012-08-22 05:33:10  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, dpend.prm:  TCP/IP error 113 (No route to host).

2012-08-22 05:33:10  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, dpend.prm:  PROCESS ABENDING.

導致原因:因為目標端235上的防火牆沒有關閉,導致報錯

處理方法:

在目標端機器關閉防火牆后,再啟動源端的傳輸進程DPEND,錯誤消失,文件順利傳輸過來了。

 

1.6.18  OGG-01476

 

ERROR   OGG-01476  The previous run abended due to an out of order transaction.  Issue ALTER ETROLLOVER to advance the output trail sequence past the current trail sequence number, then restart. Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.

 

在初始化的時候,由於容災端沒有准備就緒,生產端來回進行了很多次的操作,導致生產端抽取混亂,此時在進行RMAN之前,重新啟動抽取,忽略調之前的混亂信息。

處理方法:“alter extract xxx, etrollover”

1.6.19  OGG-00850

 

ERROR   OGG-00850  Oracle GoldenGate Capture for DB2, extxa.prm:  Database instance XP1 has both USEREXIT and LOGRETAIN set to off.

ERROR   OGG-01668  Oracle GoldenGate Capture for DB2, extxa.prm:  PROCESS ABENDING.

 

處理方法:

I如果是DB2 8.1/8.2,必須將USEREXITLOGRETAIN設置為ON

2如果是DB2 9.5,已經使用LOGARCHMETH1LOGARCHMETH2代替以上兩個參數,通常LOGARCHMETH1DISKLOGARCHMETH2TSM,采用這兩個參數開啟歸檔模式。在DB2 9.5中,USEREXIT可以設置為OFF,但是LOGRETAIN仍需設置為ON

 

1.6.20  OGG-01027(長事務)

示例9-25

 

WARNING OGG-01027  Long Running Transaction: XID 82.4.242063, Items 0, Extract YX_EXT1, Redo Thread 1, SCN 2379.2132775890 (10219859973074), Redo Seq #5688, Redo RBA 195997712.

 

可以通過下面的命令尋找更詳細的信息:

示例9-26

 

GGSCI> send extract xxx, showtrans [thread n] [count n]

 

其中,thread n是可選的,表示只查看其中一個節點上的未提交交易;count n也是可選的,表示只顯示n條記錄。

例如查看xxx進程中節點1上最長的10個交易,可以通過下列命令:

示例9-27

 

GGSCI> send extract extsz , showtrans thread 1 count 10

 

記錄XID,通過DBA查找具體的長交易執行的內容

示例9-28

 

GGSCI> SEND EXTRACT xxx, SKIPTRANS <82.4.242063> THREAD <2> //跳過交易

GGSCI>SEND EXTRACT xxx, FORCETRANS <82.4.242063> THREAD <1> //強制認為該交 易已經提交

 

使用這些命令只會讓GoldenGate進程跳過或者認為該交易已經提交,但並不改變數據庫中的交易,們依舊存在於數據庫中。因此,強烈建議使用數據庫中提交或者回滾交易而不是使用GoldenGate處理。

 

1.6.21  OGG-01072

示例9-22

 

ERROR OGG-01072 LOBROW_get_next_chunk(LOBROW_row_t *, BOOL, BOOL, BOOL, LOBROW_chunk_header_t *, char *, size_t, BOOL, *) Buffer overflow, needed: 132, alloc 2.

 

處理方法:

1如果版本為11.1.1.0.1 Build 078版本,升級到最新的補丁包

2使用“ulimit –a”查看資源使用限制,調整資源為unlimited。

3Extract: DBOPTIONS LOBBUFSIZE <bytes>。

4replicat: DBOPTIONS LOBWRITESIZE 1MB。

 

1.7  用戶不存在

問題描述:

2010-05-02 10:45:20  GGS ERROR      2001  Oracle GoldenGate Delivery for Oracle, rcrmheal.prm:  Fatal error executing DDL replication: error [Error code [1918], ORA-01918: user 'KINGSTAR' does not exist, SQL  /* GOLDENGATE_DDL_REPLICATION */ alter user kingstar account unlock  ], no error handler present.

問題分析:

根據分析日志可以確定是目標端不存在該用戶導致的故障。

問題處理:

方法1、如果不需要同步該用戶,可以在目標端去掉掉映射該用戶,再重啟進程。

        例如去掉:MAP KINGSTAR.*, TARGET CRMKINGSTAR.*;

方法2、在目標端手工創建該用戶,再重啟進程。

1.8  表不存在

問題描述:

2010-05-10 15:02:12  GGS ERROR       101  Oracle GoldenGate Delivery for Oracle, rcrmheal.prm:  Table CRMOLAP.TB_FT_OFSTK_CLIENT_BY_DAY does not exist in target database.

問題分析:

根據分析日志可以確定是目標端不存在該表導致的故障。

問題處理:

方法1、如果不需要同步該表,可以在目標端排除掉該表,再重啟進程。

       例如添加:MAPEXCLUDE OLAP.TB_FT_OFSTK_CLIENT_BY_DAY

方法2、在目標端手工創建該表, 異構數據庫還需要重新生成表結構定義文件,再重啟進程。

1.9  數據庫索引失效

問題描述

2010-07-05 14:48:32  GGS WARNING     218  Oracle GoldenGate Delivery for Oracle, rapcaxht.prm:  SQL error 1502 mapping AXHT.DOCONTRACT to APCAXHT.DOCONTRACT OCI Error ORA-01502: index 'APCAXHT.PK_SID' or partition of such index is in unusable state (status = 1502), SQL <INSERT INTO "APCAXHT"."DOCONTRACT" ("SID","RIQI","JGID","HT_ID","KH_XM","KH_ID","KH_NUM","CREATEDDATE","MODIFIEDDATE","USERNAME","REALNAME","BS","MEMO1","MEMO2","KH_IDLX","DXJGID","KH_IDTY","CPID") VA>.

問題分析:

數據庫索引失效引起的故障

問題處理:

重建這個有問題的索引,再重啟進程,故障排除。

1.10  表結構不一致

問題描述:

2010-05-08 14:50:44  GGS ERROR       218  Oracle GoldenGate Delivery for Oracle, rcrmheal.prm:  Error mapping from OLAP.TB_FT_OFSTK_BAL_HIS to CRMOLAP.TB_FT_OFSTK_BAL_HIS.

問題分析:

出現該問題一般都是由於同步的源和目標表結構不一致,包括表字段和索引。

問題處理:

1、 如果是表字段不一致,需要修改表字段,異構數據庫還需要重新生成表結構定義文件,再重啟進程。

2、 如果是索引不一致,需要重建索引,異構數據庫還需要重新生成表結構定義文件,再重啟進程。

1.11  磁盤空間不足

問題描述:

2010-05-07 04:05:31  GGS ERROR       103  Oracle GoldenGate Collector:  Unable to write to file "./dirdat/crm/fl003629" (error 28, No space left on device).

2010-05-07 04:05:31  GGS ERROR   190 PROCESS ABENDING.

問題分析:

根據分析日志可以確定是磁盤空間不足導致的故障。

問題處理:

划分足夠的磁盤空間,再重啟進程。

1.12  TCP/IP故障

問題描述:

2010-06-25 21:06:04  GGS WARNING     150  Oracle GoldenGate Capture for Oracle, BSAIAXEC.prm:  TCP/IP error 10060 (由於連接方在一段時間后沒有正確答復或連接的主機沒有反應,連接嘗試失敗。).

問題分析:

根據分析日志可以確定是不能連接到遠程主機,包括ip地址或端口號。

問題處理:

需要打通能夠連接到遠程主機IP和端口,再重啟進程。

1.13  數據庫不能連接

問題描述:

2010-05-20 18:25:13  GGS ERROR       182  Oracle GoldenGate Delivery for Oracle, rtasaxta.prm:  OCI Error during OCIServerAttach (status = 12154-ORA-12154: TNS:could not resolve the connect identifier specified).

問題分析:

這種故障是數據庫不能連接導致goldengate進程異常。

問題處理:

需要先解決數據庫異常,再重啟進程。

1.14  表空間不足

問題描述:

2010-02-01 17:19:18  GGS ERROR    103  Discard file (./dirrpt/rep1.dsc)      exceeded max bytes (10000000).

問題分析:

根據錯誤可以看出直接引起GoldenGate進程停止的原因是discard文件被寫滿了,是什么原因造成discard文件被寫滿的呢?從discard文件中我們看到是發生了ORA-01653: unable to extend 錯誤,看到這里我相信大家都知道該怎么處理了吧,我們只要擴展這個aaa.TB_LVY_TEMPINVOIC對象所在的表空間的大小即可。

問題處理:

1、找到相關對象存儲的表空間;

例如:select owner,table_name,tablespace_name from dba_tables

2、執行表空間擴展

例如:ALTER TABLESPACE tbs_03 ADD DATAFILE 'tbs_f04.dbf' SIZE 100K AUTOEXTEND ON NEXT 10K MAXSIZE 100K;

1.15  網絡傳輸問題

問題描述:

2010-06-29 16:22:28  GGS ERROR       112  There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Remote file used is /oradataA/ggtrail/b1000008, reply received is Unable to lock file "/oradataA/ggtrail/b1000008" (error 13, Permission denied). Lock currently held by process id  (PID) 3674350).

問題分析:

問題處理:

方法1、手工去KILL掉相應的鎖進程,再重新啟動進程。

方法2、不需理會,大概2小時后會自動釋放該鎖進程。

方法3、goldengate 10.4.0.76 會解決鎖問題。

 

 

1.16  捕獲進程不能為表添加補充日志

問題描述:

2010-07-19 16:20:03  GGS ERROR      2100  Oracle GoldenGate Capture for Oracle, ecrmheal.prm:  Could not add TRAN DATA for table, error [ORA-32588: supplemental logging attribute all column exists, SQL ALTER TABLE "AXTECH"."TB_FUND_MATCHING" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS  /* GOLDENGATE_DDL_REPLICATION */], error code [32588], operation [ALTER TABLE "AXTECH"."TB_FUND_MATCHING" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS  /* GOLDENGATE_DDL_REPLICATION */ (size 113)].

問題分析:

因為表已經開啟了補充日志(附加日志),而對表做DDL操作時,參數“DDLOPTIONS ADDTRANDATA”會對表重新開啟補充日志(附加日子),但如果該表超過32個字段,並且該表沒有唯一索引時會出現上面的異常;

問題處理:

方法1、去掉參數“DDLOPTIONS ADDTRANDATA”。

方法2DELETE TRANDATA  用戶.表

方法3、登錄數據庫執行: ALTER TABLE AXHT.BMBM2002 DROP SUPPLEMENTAL LOG DATA (ALL) COLUMNS

1.17  數據庫補充日志(附加日志)沒有打開

問題描述:

2010-10-14 09:25:50  GGS ERROR       190  Oracle GoldenGate Capture for Oracle, ECRMGGS.prm:  No minimum supplemental logging is enabled. This may cause extract process to handle key update incorrectly if key column is not in first row piece.

2010-10-14 09:25:50  GGS ERROR       190  Oracle GoldenGate Capture for Oracle, ECRMGGS.prm:  PROCESS ABENDING.

問題分析:

根據分析日志可以確定是源端oracle補充日志沒有打開導致的故障,如果主鍵或唯一索引是組合的(復合的),就需要為表配置supplemental log,否則就不必,也就是說,如果所有表的主鍵是單列的,那根本就不必去理會它是什么意思如果更新了主鍵中的部分字段,supplemental log的作用就是把該記錄其余的組成部分的數據也傳輸到目標機,否則目標機就存在不確定性

問題處理:

登錄數據庫,使用命令ALTER DATABASE ADD SUPPLEMENTAL LOG DATA打開補充日志。然后重新添加捕獲進程和本地隊列。

1.18  表補充日志(附加日志)沒有打開

問題描述:

2010-10-14 09:30:49  GGS WARNING  Z1-078  Oracle GoldenGate Capture for Oracle, ECRMGGS.prm:  No valid default archive log destination directory found for thread 1.

2010-10-14 09:30:50  GGS ERROR       500  Oracle GoldenGate Capture for Oracle, ECRMGGS.prm:  Found unsupported in-memory undo record in sequence 2, at RBA 39675920, with SCN 0.554993 (554993) ... Minimum supplemental logging must be enabled to prevent data loss.

2010-10-14 09:30:51  GGS ERROR       190  Oracle GoldenGate Capture for Oracle, ECRMGGS.prm:  PROCESS ABENDING.

問題分析:

根據分析日志可以確定是源端oracle補充日志沒有打開導致的故障。

問題處理:

登錄數據庫,使用命令ALTER DATABASE ADD SUPPLEMENTAL LOG DATA打開補充日志。

1.19  DDL復制表沒找到

問題描述:

2010-10-14 13:32:10  GGS ERROR      2008  Oracle GoldenGate Capture for Oracle, ECRMGGS.prm:  DDL Replication is enabled but table GGS.GGS_DDL_HIST is not found. Please check DDL installation in the database.

2010-10-14 13:32:10  GGS ERROR       190  Oracle GoldenGate Capture for Oracle, ECRMGGS.prm:  PROCESS ABENDING.

問題分析:

根據分析日志可以確定是DDL復制操作已經打開,但沒有找到安裝復制DDL執行腳本產生的表GGS.GGS_DDL_HIST導致的故障。

問題處理:

因為安裝復制DDL是使用用戶GGDDL,執行腳本后會在該用戶產生跟蹤goldengate運行的表,所以要實現支持DDL操作,在參數文件中登錄數據庫必須使用GGDLL和對應的密碼登錄。例如:USERID GGDDL@CRMDB,PASSWORD GGDDL。

1.20  GoldenGateupdate操作節點間不同步

GoldenGateupdate操作節點間不同步

故障現象:節點1、節點2進行update操作后,不能實現同步


解決過程:

1、常規巡檢:

檢查進程狀態:正常

GGSCI (gc1) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           

EXTRACT     RUNNING     EORA_1      00:00:00      00:00:04    

EXTRACT     RUNNING     PORA_1      00:00:00      00:00:08    

REPLICAT    RUNNING     RORA_1      00:00:00      00:00:05    

重新賦權:

SQL> grant INSERT, UPDATE, DELETE on scott.tcustmer to ogg; 

--把需要同步表的DML操作授權給ogg

SQL> grant INSERT, UPDATE, DELETE on scott.tcustord to ogg; 

--把需要同步表的DML操作授權給ogg

故障依舊。

2、執行如下操作:

GGSCI (gc1) 8> ADD TRANDATA scott.*  

注意:先關閉rora_1進程,再添加,然后重新啟動。  

--兩個節點操作:ADD TRANDATA scott.新表,設置后發現ogg會去捕捉新表的日志信息

故障解決,兩節點可以同步update操作。


3、總結:

如果新建的表進行同步,update可能不成功,需要進行以下操作,

這樣ogg才會去捕捉新表的日志信息:

操作指令:ADD TRANDATA scott.new_tab

但要注意:先關閉rora_1進程,再添加,然后重新啟動。

1.21   ERROR: Could not delete DB checkpoint for REPLICAT  

GGSCI (mail) 9> info all

 

 Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

 MANAGER     RUNNING                                           

 REPLICAT    ABENDED     REP1        00:00:00      00:58:10    

 

 

GGSCI (mail) 10>  delete REPLICAT REP1

 ERROR: Could not delete DB checkpoint for REPLICAT REP1 (Database login required to delete database checkpoint).

 

 GGSCI (mail) 11> dblogin userid ogg,password oracle

 Successfully logged into database.

 

 GGSCI (mail) 12> delete REP1

 Deleted REPLICAT REP1.

 

 

 GGSCI (mail) 13> info all

 

 Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

 MANAGER     RUNNING                                           

 

若還是不能刪除,則如下操作:

GGSCI (rhel6_lhr) 23> delete REPLICAT   RORA_HR

ERROR: Could not delete DB checkpoint for REPLICAT RORA_HR (Database login required to delete database checkpoint).

 

 

GGSCI (rhel6_lhr) 24> dblogin userid ggusr@ogg2, password lhr

Successfully logged into database.

 

GGSCI (rhel6_lhr) 25>  delete RORA_HR

ERROR: Could not delete DB checkpoint for REPLICAT RORA_HR (OCI Error ORA-00942: table or view does not exist (status = 942). Deleting from checkpoint table ggusr.ggschkpt, group 'RORA_HR', key 293545198 (0x117f24ee), SQL <DELETE FROM ggusr.ggschkpt  WHERE group_name = 'RORA_HR' AND        group_key  = 293545198>).

 

 

GGSCI (rhel6_lhr) 26> info all

 

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

 

MANAGER     RUNNING                                           

REPLICAT    STOPPED     RORA_HR     00:00:00      00:13:32    

 

 

GGSCI (rhel6_lhr) 27> delete REPLICAT   RORA_HR

ERROR: Could not delete DB checkpoint for REPLICAT RORA_HR (OCI Error ORA-00942: table or view does not exist (status = 942). Deleting from checkpoint table ggusr.ggschkpt, group 'RORA_HR', key 293545198 (0x117f24ee), SQL <DELETE FROM ggusr.ggschkpt  WHERE group_name = 'RORA_HR' AND        group_key  = 293545198>).

 

 

GGSCI (rhel6_lhr) 28> add checkpointtable ggusr.ggschkpt

 

Successfully created checkpoint table ggusr.ggschkpt.

 

GGSCI (rhel6_lhr) 29> delete REPLICAT   RORA_HR

Deleted REPLICAT RORA_HR.

1.22  GoldenGate OGG-00717 unsupported in-memory undo record

 

 2013-07-08 16:31:48  INFO    OGG-01515  Oracle GoldenGate Capture for Oracle, EXT1.prm:  Positioning to begin time 2013-7-8 下午04:10:22.

 

2013-07-08 16:31:48  INFO    OGG-01516  Oracle GoldenGate Capture for Oracle, EXT1.prm:  Positioned to Sequence 18, RBA 9212432, SCN 0.0, 2013-7-8 下午04:10:22.

 

2013-07-08 16:31:48  ERROR   OGG-00717  Oracle GoldenGate Capture for Oracle, EXT1.prm:  Found unsupported in-memory undo record in sequence 18, at RBA 9212432, with SCN 0.1347542 (1347542) ... Minimum supplemental logging must be enabled to prevent data loss.

 

2013-07-08 16:31:48  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, EXT1.prm:  PROCESS ABENDING.

 

 

 

 

搭建GoldenGate環境過程時,碰到了一個詭異的問題。“ Found unsupported in-memory undo record in sequence 18”   實際上,oracle的最小日志附加模式已經開啟。

 

經過重啟進程mgr, extract進程,問題解決若還是不能解決就刪掉重建該進程。


1.23   中文表/中文字段處理

比如有個如下的中文表:

示例9-40

 

create table 測試表(

ID NUMBER,

姓名 VARCHAR2(30),

FLAG CHAR(1),

CONSTRAINT PK_TESTD PRIMARY KEY (ID) USING INDEX);

 

--源端創建MV LOGMV

drop materialized view log on "測試表";

create materialized view log on "測試表" with primary key;

drop materialized view mv_cn_table;

create materialized view mv_cn_table refresh fast on commit as select id, 姓名 as en_name,flag from "測試表";

 

在目標端創建表及view:

示例9-41

 

create or replace view v_cn_table as select id,姓名 as en_name,flag  from 試表;

 

這里NLS_LANGGG中,抽取和復制必須設置為和目標字符集一致

示例9-42

 

SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")

 

Extract相關:

示例9-43

 

extract ODISC

SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")

userid custom_src, password custom_src

exttrail D:/GoldenGate/dirdat/ODISoc/oc

TABLE CUSTOM_SRC.MV_CN_TABLE;

 

Pump相關

示例9-44

 

extract ODIT1P

SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")

PASSTHRU

rmthost localhost, mgrport 7909

rmttrail D:/gg_stg/dirdat/ODIT1op/op

TABLE CUSTOM_SRC.MV_CN_TABLE;

 

Replicat相關:

示例9-45

 

replicat ODIT1A1

SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")

userid odi_staging,  password odi_staging

discardfile D:/gg_stg/dirrpt/ODIT1.dsc, purge

ASSUMETARGETDEFS

 

這里必須指定APPLYNOOPUPDATES參數,否則UPDATE有問題,另外,也要指定KEYCOLS,否則刪除和更新有問題:

示例9-46

 

map CUSTOM_SRC.MV_CN_TABLE, TARGET ODI_STAGING.V_CN_TABLE, KEYCOLS (ID);

 






免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM