OGG故障錯誤處理總結
http://blog.itpub.net/26655292/viewspace-2142867/
第一章 GoldenGate錯誤分析與處理
在維護GoldenGate過程中,由於各種意外情況,難免還是會遇到各種各樣的問題。掌握一些常見的GoldenGate故障診斷和錯誤分析的方法是非常有必要的,而且掌握這些錯誤分析工具也進一步加深對GoldenGate產品的認識與對GoldenGate原理的理解。
1.1 GoldenGate常見異常處理
GoldenGate運行起來后,隨着時間的推移可能會碰到各種各樣的問題,下面就來介紹常見的異常現象以及常見的異常處理方法。
1.1.1 異常處理的一般步驟
首先確定是GoldenGate的哪類進程有故障(是抽取,投遞還是復制進程有問題),解決故障的一般思路如下。
(1)通過GGSCI>view report命令查找ERROR字樣,確定錯誤原因並根據其信息進行排除。
(2)通過GGSCI>view ggsevt查看告警日志信息。
(3)檢查兩端數據庫是否正常運行,網絡是否連通。
(4)通過logdump工具對隊列文件進行分析。
1.1.2 RAC單節點失敗
在RAC環境下,GoldenGate軟件安裝在共享目錄下,可以通過任一個節點連接到共享目錄,啟動GoldenGate運行界面。如果其中一個節點失敗,導致GoldenGate進程中止,可直接切換到另外一個節點繼續運行。
操作步驟如下。
(1)以Oracle用戶登錄源系統(使用另外一個正常的節點)。
(2)確認將GoldenGate安裝的所在文件系統裝載到另一節點相同目錄。
(3)確認GoldenGate安裝目錄屬於Oracle用戶及其所在組。
(4)確認Oracle用戶及其所在組對GoldenGate安裝目錄擁有讀寫權限。
(5)進入GoldenGate安裝目錄。
(6)執行./ggsci進入命令行界面。
(7)執行start mgr啟動MGR。
(8)執行start er *啟動所有進程。
檢查各進程是否正常啟動,即可進入正常復制。
1.1.3 Extract常見異常
以下為列舉的一些常見錯誤信息作參考用。
Extract進程包括抽取與投遞進程,投遞進程報錯大部分原因是由於網絡故障。對於源數據庫,抽取進程ext**如果變為abended,則可以通過在GGSCI中使用view report命令查看報告,可以通過搜索ERROR快速定位錯誤。
一般情況下,抽取異常的原因是因為其無法找到對應的歸檔日志,可以通過到歸檔日志目錄命令行下執行
示例9-1:
ls –lt arch_x_xxxx.arc
查看該日志是否存在,如不存在則可能的原因如下。
“ 日志已經被壓縮。
“ GoldenGate無法自動解壓縮,需要人工解壓縮后才能讀取。
“ 日志已經被刪除。
如果日志已經被刪除,需要進行恢復才能繼續復制。
一般需要定期備份歸檔日志,並清除舊的歸檔日志。需要保證歸檔日志在歸檔目錄中保留足夠長時間之后,才能被備份和清除。即定期備份清除若干小時之前的歸檔,而不是全部歸檔。保留時間計算如下。
某歸檔文件保留時間?抽取進程處理完該文件中所有日志所需的時間。
可以通過命令行或者GoldenGate Director Web界面,運行info extxx showch命令查看抓取進程ext處理到哪條日志序列號。在此序列號之前的歸檔,都可以被安全的清除。
抽取進程在抽取不支持的數據對象時也會abend,report文件會有詳細的報錯信息,根據report文件來定位錯誤信息然后再排錯即可。
下面再單獨列出更多的幾個故障。
(1)Extract: Application failded to initialize(Win)。
錯誤信息:run GGSCI command but the Alert window report "Application failded to initialize(0xc000026e)"。
GoldenGate在Windows平台上需要安裝Microsoft Visual C ++ 2005 SP1 Redistributable Package。如果是Microsoft Itanium平台,需要安裝vcredist_IA64.exe。
Windows 2008需以下額外操作:右擊‘cmd’ (DOS),選擇‘run as administrator’,然后在該命令行窗口中啟動MGR和Extract才能夠讀取數據庫日志。
將OGG安裝為服務時(即運行“install ADDSERVICE”),需要使用管理員權限,這樣啟動服務后即能訪問日志。
通過以下方法為運行MGR和Extract的用戶添加讀取日志文件的權限,右鍵單擊文件->property->security->edit->add。
(2)Extract: Cannot load program./ggsci…
錯誤分析:請首先檢查該OGG Build是否與操作系統和數據庫相符;其次如果是Aix請檢查xLC版本是否符合10.0以上。
另外,檢查環境變量中動態庫路徑是否包含了數據庫動態庫目錄,例如:
示例9-2:
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
不同平台下的環境變量不同。
“ AIX LIBPATH。
“ Solaris、Linux等 LD_LIBRARY_PATH。
“ HP-Unix SHLIB_PATH。
重設環境變量需重啟Mgr和Ext/Rep進程。
(3)Extract: Block size mismatch (8192/512)…
裸設備的偏移量各操作系統默認為0,但AIX默認為4096。當創建裸設備時使用了-TO選項時,Oracle不會跳過4096字節而是直接從0開始讀寫。 因此在AIX下使用裸設備時,出現此錯誤需要指定OGG從偏移量0開始讀取。
示例9-3:
tranlogoptions rawdeviceoffset 0
該參數其在實際環境中使用幾率非常高,在以前版本中如果缺少此參數Extract立即終止,但新版本Extract會持續進行嘗試,並不自動終止,需檢查報告文件。
(4)Extract: ORA-15000 ASM connection error
該錯誤為OCI錯誤,表示Extract是在連接數據庫時出現問題,根據錯誤信息判斷為權限問題。
首先在Extract參數中檢查ASM相關參數tranlogoptions asmuser sys@+ASM1,asmpassword oracle,再檢查tnsnames.ora和listener.ora驗證ASM實例配置是否正確,確認ASM用戶具有SYSDBA 權限;如果使用SYS,需要將ASM實例的init.ora中REMOTE_LOGIN_PASSWORDFILE參數設置為SHARED(多個數據庫可以使用一個password文件,只有SYS用戶可以遠程登錄)。
使用sqlplus驗證:
示例9-4:
sqlplus sys/oracle@asm1 as sysdba; //可以登錄
sqlplus sys/oracle@asm1; //報告15000錯誤
(5)Extract: Encountered SCN That Is Not Greater Than The Highest SCN Already Processed…
原因分析:在Oracle RAC環境中,Extract會啟動一個coordinator線程對各個節點上的操作進行根據SCN進行排序,它在交易提交后會等待THREADOPTIONS MAXCOMMITPROPAGATIONDELAY參數所定義時間來確認空閑節點沒有交易,然后再收集交易數據;寫入該交易后如果空閑節點后來又讀到了一個SCN號要小的交易,則會報告該錯誤。
可能原因:
“ 各節點之間沒有配置時鍾同步。
“ 一個節點比另外一個節點慢(IO問題可能性較大)。
解決辦法:
調整Extract參數:
示例9-5:
THREADOPTIONS MAXCOMMITPROPAGATIONDELAY <msec> IOLATENCY <msec>
MAXCOMMITPROPAGATIONDELAY有效范圍是0-90000ms,默認為3s(即3000ms)。
GGS V9.x多了一個IOLATENCY參數,可以與上面參數一起加大等待時間。IOLATENCY默認為1.5s,最大值為180000。
建議出現該錯誤后可以將此二參數設置為較大值,然后逐步降低獲取最佳設置。
需要補充說明的是,出現此錯誤后,因后面的交易可能已被寫入日志,重啟Extract可成功啟動,但是可能出現如下問題:Extract會重寫當前隊列覆蓋前面的交易數據,后面的Data Pump進程可能會出現“abend with incompatible record errors”錯誤終止(舊版本可能出現)。
此問題的恢復步驟如下。
① 停止所有Data Pump和Replicat,針對所有的Extract記錄其Write Checkpoint的隊列Seqno。
② 對於每個Extract向下滾動一個隊列:
示例9-6:
ALTER EXTRACT [name], ETROLLOVER
啟動Extract查看是否滾動到了下一個隊列,記錄其新隊列seqno,應當是舊隊列號+1。
③ 修改Data Pump從新的隊列開始傳輸:
示例9-7:
ALTER EXTRACT [pump_name], EXTSEQNO ##### EXTRBA 0
重啟Data Pump查看是否能夠重啟成功並從新的隊列傳輸。
④ 修改Replicat參數文件,加入或者打開HANDLECOLLISIONS,如果有GROUPTRANSOPS和MAXTRANSOPS請注釋掉,啟動Replicat,觀察其是否能夠讀取新傳輸過來的隊列如Replicat無法自動滾動到下一個隊列,需要通過如下命令手工滾動:
示例9-8:
alter replicat [replicat_name], EXTSEQNO ##### EXTRBA 0
等待Replicat處理到結尾沒有延遲時,可以關閉HANDLECOLLISIONS和恢復原來的GROUPTRANSOPS和MAXTRANSOPS參數。
⑤ 重新啟動Replicat即可恢復正常復制。
1.1.4 網絡故障
如果MGR進程參數文件里面設置了autorestart參數,GoldenGate可以自動重啟,無需人工干預。
當網絡不穩定或者發生中斷時, GoldenGate負責產生遠地隊列的Pump進程會自動停止。 此時,MGR進程會定期根據mgr.prm里面autorestart設置自動啟動Pump進程以試探網絡是否恢復。在網絡恢復后,負責產生遠程隊列的Pump進程會被重新啟動,GoldenGate的檢查點機制可以保證進程繼續從上次中止復制的日志位置繼續復制。
需要注意的是,因為源端的抽取進程(Capture)仍然在不斷地抓取日志並寫入本地隊列文件,但是Pump進程不能及時把本地隊列搬動到遠地,所以本地隊列文件無法被自動清除而堆積下來,需要保證足夠容量的存儲空間來存儲堆積的隊列文件。計算公式如下。
存儲容量?單位時間產生的隊列大小×網絡故障恢復時間
MGR定期啟動抓取和復制進程參數配置參考:
示例9-9:
GGSCI > edit param mgr
port 7809
autorestart er *,waitminutes 3,retries 5,RESETMINUTES 60
每3分鍾重試一次,5次重試失敗以后等待60分鍾,然后重新試三次。
1.1.5 Replicat進程常見異常
對於目標數據庫,投遞進程repXX如果變為abended,則可以通過在GGSCI中使用view report命令查看報告,可以通過搜索ERROR快速定位錯誤。
復制進程的錯誤通常為目標數據庫錯誤,比如:
“ 數據庫臨時停機。
“ 目標表空間存儲空間不夠。
“ 目標表出現不一致。
可以根據報告查看錯誤原因,排除后重新啟動rep進程即可。
需要注意一點:往往容易忽略UNDO表空間。如果DML語句中包含了大量的UPDATE和DELETE操作,則目標端UNDO的生成速度會很快,有可能填滿UNDO表空間。
典型錯誤(數據復制典型錯誤)如下:
示例9-10:
- SQL error 1403 mapping 2010-02-25 13:20:08 GGS WARNING 218 Oracle GoldenGate Delivery for Oracle, rep_stnd.prm: SQL error 1403 mapping HR.MY_EMPLOYEE to HR.MY_EMPLOYEE.
可能原因包括以下幾個方面。
“ 兩端結構不一致(異構環境,列和主鍵不同)。
“ 兩端有不一致記錄。
“ 附加日志不全。
可以到discard文件中查看具體錯誤信息,如果為UPDATE或者DELETE找不到對應記錄,並且某幾個字段為空,則可認定為缺少了附加日志。
1.2 使用reperror進行錯誤處理
對於Replicat進程處理DML操作過程中報錯時,GoldenGate提供了一個參數用來控制如何處理Replicat進程的報錯。這就是本節內容要介紹的reperror參數。這個參數能控制大部分的GoldenGate錯誤處理的手段。
如某案例的Replicat進程參數如圖9-1所示。
圖9-1
1.2.1 reperror處理類型與含義
Reperror在GoldenGate11版本中共提供了7類處理錯誤方式,分別如下。
(1)abend:Replicat遇到不能處理的記錄時,回滾事務,然后停止處理,Replicat進程狀態轉為abend。
(2)discard:將不能處理記錄的錯誤信息放到discard文件而Replicat進程繼續處理下面的記錄。
(3)exception:將錯誤按照預先定義好的方式處理。
(4)ignore:將不能處理的記錄忽略掉,然后繼續處理下面的記錄。
(5)retryop [maxretries <n>]:遇到不能處理的記錄時,重試n次。
(6)transabort [,maxretries <n>][, delay[c]sesc<n>];終止事務處理,將rba號指到該事務的開頭,也可以指定重試幾次。
(7)reset:清除掉所有的reperror規則,然后將reperror的規則默認為abend。
在Replicat進程的參數中,可以將任意一個處理類型設置為默認,如reperror、default、abend。
通常,為了保證數據的一致性,都將reperror的默認規則設置為abend。
1.2.2 復制進程常見數據庫錯誤類型與處理方法
在實際的GoldenGate系統中,很大一部分Replicat錯誤信息都類似於ORA開頭的數據庫錯誤(這里以Oracle數據庫為例)。雖然,通常對於ORA錯誤,需要手動查找數據庫的原因,但可以用reperror處理一些預知的錯誤類型,然后再在數據庫層面找到錯誤的原因,手動排除,而不至於導致該進程處理其他正常的表而abend掉。
例如:可以忽略掉重復數據的插入而其他類型的報錯則abend。
示例9-11:
Reperror (default, abend)
Reperror (-1, ignore)
當然,也可以只針對某張表的忽略掉重復數據的插入而abend掉其他類型的報錯。
示例9-12:
REPERROR (-1, IGNORE)
MAP sales.product, TARGET sales.product;
REPERROR RESET
MAP sales.account, TARGET sales.account;
最常見的錯誤為ORA-1403。
1403錯誤是指記錄無法投遞到目標庫,純屬數據錯誤,要通過查看錯誤信息和discard文件,到兩端庫尋找相應記錄,結合logdump分析隊列中的實際數據,再分析出問題的原因。可能存在的原因有:兩端表結構不一致;附加日志錯誤;初始化方法錯誤導致不一致;目標端級聯刪除、trigger沒有被禁止;目標端存在Oracle的job或者操作系統任務修改數據。
處理方法:
“ 重新初始化該表。
“ 手工修復該條數據。
“ 修改reperror參數為discard或ignore模式,忽略掉錯誤(在使用這個參數之前用戶應該非常清楚自己在做什么,因為它會導致兩端數據不一致)。
1.3 Ddlerror處理DDL復制錯誤
當GoldenGate打開了DDL復制時,當DDL復制報錯時,則需要用到此處的ddlerror參數預處理一些常見的報錯信息。Ddlerror對於抽取、復制進程均有效,默認為abend。
Ddlerror參數的語法為:
示例9-13:
DDLERROR
{<error> | DEFAULT} {<response>}
[RETRYOP MAXRETRIES <n> [RETRYDELAY <delay>]]
{INCLUDE <inclusion clause> | EXCLUDE <exclusion clause>}
[IGNOREMISSINGTABLES | ABENDONMISSINGTABLES]
如當DDL復制報ORA-1430錯誤,傳遞了重復的alter語句導致,則可以用ddlerror (1430, discard)將錯誤信息扔到discard文件里。
其他的錯誤處理與reperror類似。
1.4 Discardfile記錄進程錯誤信息
用discardfile 這個參數來生成一個discard文件,將GoldenGate不能處理的信息記錄到這個文件。這樣對GoldenGate的troubleshooting非常的有幫助。
如源端表結構有變化,默認傳遞過來的數據應用時Replicat進程則報錯,此時則可以通過discard文件看到報錯信息位哪個表做了怎樣的alter操作,再在目標端也將表結構改變一些,錯誤即可排除。
Discard文件默認在GoldenGate安裝目錄的dirrpt子文件夾,如圖9-2所示。
圖9-2
Discard文件記錄的報錯信息如圖9-3所示。
圖9-3
1.5 GoldenGate常見錯誤分析
(1)解決GoldenGate錯誤的一個關鍵點就是通過錯誤分析工具(包括report文件,ggserr.log discard文件logdump工具,GGSCI命令行)確定錯誤的根源是哪個組件引起的。
“ 系統或者網絡?
“ 數據庫報錯或者應用報錯?
“ GoldenGate安裝報錯?
“ GoldenGate的某個進程報錯?
“ GoldenGate的參數配置文件報錯?
“ SQL語句或者存儲過程報錯?
然后再確定錯誤的原因,逐個排查。
(2)當GoldenGate遇到錯誤時,則可以借助日志、report文件找到錯誤原因,一步一步來排查。一般的錯誤信息GoldenGate都會提示有相應的解決辦法。
如下介紹一個錯誤案例:
通過命令:
示例9-14:
GGSCI>view ggsevt
看到的報錯信息如圖9-4所示。
圖9-4
通過view report dpeyb 看到的也是類似的信息。
再來觀察容災端復制進程的報錯信息為:
示例9-15:
2011-03-02 12:03:37 ERROR OGG-01028 Incompatible record in ./dirdat/ yb018262, rba 72955479 (getting header).
通過logdump進入到該trail文件查看,如圖9-5所示。
圖9-5
通過分析推敲等,確認是因為trail文件有一條記錄已損壞,導致投遞進程不識別,不能自動翻滾到下一個trail文件,而復制進程也不能自動應用到下一個trail文件,Pump進程通過手動etrollover,復制進程通過alter手動指定到下一個trail文件應用,故障即可排除。
1.5.1 AIX GGSCI無法運行
錯誤信息:
示例9-16:
Cannot load ICU resource bundle 'ggMessage', error code 2 - No such file or directory
Cannot load ICU resource bundle 'ggMessage', error code 2 - No such file or directory
IOT/Abort trap (core dumped)
或者GGSCI可以啟動,但是運行任何命令都報上面的錯誤。
處理方法:通常使用已有的mount點安裝GoldenGate,在mount時使用了並發CIO參數。新建文件系統,重新mount,作為GoldenGate安裝目錄。
錯誤信息:
示例9-17:
$ ./ggsci
exec(): 0509-036 Cannot load program GGSCI because of the following errors:
0509-130 Symbol resolution failed for GGSCI because:
0509-136 Symbol _GetCatName__FiPCc (number 158) is not exported from dependent module /usr/lib/libC.a[ansi_64.o].
0509-136 Symbol _Getnumpunct__FPCc (number 162) is not exported from dependent module /usr/lib/libC.a[ansi_64.o].
0509-136 Symbol __ct__Q2_3std8_LocinfoFPCci (number 183) is not exported from dependent module /usr/lib/libC.a[ansi_64.o].
0509-192 Examine .loader section symbols with the 'dump -Tv' command.
原因是XLC是6.0版本,升級XLC版本到10.1以上,問題即可解決。
1.5.2 HP-UX GGSCI無法運行
錯誤信息:core dumped
該問題只在HP-UX11.31上發現。
處理方法:環境變量沒有設置正確。
1.5.3 隊列文件保存天數
在mgr.prm中,添加:
示例9-29:
PURGEOLDEXTRACTS ./dirdat/*,usecheckpoints, minkeepdays 3
修改之后,必須重啟manager即可看到隊列文件占用的空間被按照上面指定的規則釋放。
如果存儲空間不夠,可以將minkeepdays修改為MINKEEPHOURS。
如果源端存儲空間不足,最好修改最少保留的時間。
9.5.12 復制進程拆分及指定隊列文件及RBA
拆分前通過INFO XXX獲取隊列文件信息及RBA號,返回樣例如下:
示例9-30:
GGSCI> INFO REPYXA
REPLICAT REPYXA Last Started 2011-01-08 19:48 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:01:42 ago)
Log Read Checkpoint File ./dirdat/p1000556 First Record RBA 59193235
在將Replicat進程拆分后,指定從拆分前的隊列文件及RBA號碼開始復制:
示例9-31:
ALTER REPLICAT xxx EXTSEQNO nnn, EXTRBA mmm
以上面的為例:
示例9-32:
ALTER REPLICAT REPYXA 556, EXTRBA 59193235
1.5.4 BOUNDED RECOVERY
錯誤信息:
示例9-33:
BOUNDED RECOVERY: reset to initial or altered checkpoint.
數據庫問題,不能讀取第2個節點的archivelog文件。
1.5.5 排除不復制的表
在參數文件中增加:
示例9-34:
TABLEEXCLUDE schema.table_name
1.5.6 從指定時間重新抓取
重新抓取數據前提:歸檔文件沒有刪除。
示例9-35:
ALTER EXTRACT xxx, TRANLOG, BEGIN 2010-12-31 08:00
時間格式:yyyy-mm-dd [hh:mi:[ss[.cccccc]]]
如果是新建:
示例9-36:
ADD EXTRACT xxx, TRANLOG, BEGIN 2010-12-31 08:00
1.5.7 進程無法停止
通常情況是在處理大交易,尤其在有超過2小時以上的大交易,建議等待進程處理完畢。
處理方法:如果必須停止進程,可以強制殺死進程。
示例9-37:
send xxx forcestop
1.5.8 CLOB處理
如果包含CLOB字段,在Extract參數文件中必須添加:
示例9-38:
TRANLOGOPTIONS CONVERTUCS2CLOBS
1.5.9 DB2不能使用checkpoint table
處理方法:在增加Replicat進程時使用nodbcheckpoint參數。
示例9-39:
add replicat xxx, exttrail /GoldenGate/dirdat/rb, nodbcheckpoint
1.6 ogg-錯誤
1.6.1 OGG-00446
1.6.1.1 OGG-00446 Could not find archived log for sequence 53586 thread 1 under alternative destinations.
錯誤信息:
OGG-00446 Could not find archived log for sequence 53586 thread 1 under alternative destinations. SQL <SELECT MAX(sequence#) FROM v$log WHERE thread# = :ora_thread>. Last alternative log tried /arch_cx/1_53586_776148274.arc., error retri
eving redo file name for sequence 53586, archived = 1, use_alternate = 0Not able to establish initial position for sequence 53586, rba
44286992.
處理辦法:
將缺失的歸檔日志從備份中恢復出來。如果依舊找不到所需歸檔日志,那么只能重新實施數據初始化。
今天啟動一個extract時,出現以下錯誤:
2011-10-16 22:41:02 ERROR OGG-00446 Oracle GoldenGate Capture for Oracle, e430rks2.prm: Could not find archived log for sequence 10770 thread 1 under default destinations SQL <SELECT name FROM v$archived_log WHERE sequence# = :ora_seq_no AND thread# = :ora_thread AND resetlogs_id = :ora_resetlog_id AND archived = 'YES' AND deleted = 'NO>, error retrieving redo file name for sequence 10770, archived = 1, use_alternate = 0Not able to establish initial position for sequence 10770, rba 78960656.
2011-10-16 22:41:02 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, e430rks2.prm: PROCESS ABENDING.
原因是extract 所需的archived log已經被清走,不在log_archive_dest指定的目錄下,解決方法很簡單,只要把sequence 從10770開始到當前的archived log重新拷貝回log_archive_dest目錄下即可。
第一種辦法,會導致數據不一致,改變抽取進程的時間,如下執行:
GGSCI (HP-HP) 8> alter extract extl,begin now
第二種辦法:重新初始化
重新初始化過程如下:
---- source庫
SQL> col current_scn format 999999999999999
SQL> Select current_scn from v$database;
CURRENT_SCN
----------------
12242466771468
expdp XPADB/XPADB directory=DMP dumpfile=xpadb_20160125_01.dmp LOGFILE=xpadb_20160125.log TABLES=BASE_ACTIONPOWER,BASE_BANK FLASHBACK_SCN=12242466771468
--- target庫
impdp XPADRPT/xpadrpt DIRECTORY=OGGD DUMPFILE=xpadb_20160125_01.dmp LOGFILE=impdp.xpadb_20160125_01.log REMAP_SCHEMA=xpadb:xpadrpt REMAP_TABLESPACE=xpaddat:xpaddata table_exists_action=replace
start replicat ggsrep , aftercsn 12242466771468
1.6.1.2 OGG-00446 No valid log files for current redo sequence
goldengate 從oracle asm 中增量同步數據時,出現下述錯誤.
ERROR OGG-00446 No valid log files for current redo sequence 367, thread 1, error retrieving redo file name
for sequence 367, archived = 0, use_alternate = 0Not able to establish initial position for begin time 2013-03-27 15:32:46.
ERROR OGG-01668 PROCESS ABENDING.
在抽取進程的參數文件中加入TRANLOGOPTIONS DBLOGREADER即可。
參考:Extract fail due to an ASM connection configuration issue [ID 1061093.1]
Oracle GoldenGate - Version 11.1.1.0.0 and later
Information in this document applies to any platform.
To show how to recover from an extract failure when your Archive or Redo files are stored under ASM
and you see one of the following messages
ERROR 118 No Valid Log File For Current Redo Sequence Xxxx, Thread Y
ERROR 500 No valid log files for current redo sequence X, thread Y, error retrieving redo file name for sequence X, archived = 0, use_alternate = 0 Not able to establish initial position for begin time YYYY-MM-DD HH:MI:SS
ERROR OGG-00446 error 2 (No such file or directory) opening redo log <log file name>.dbf for sequence ####
Not able to establish initial position for begin time YYYY-MM-DD HH:MI:SS
If you are running Oracle ASM, the problem may be that the ASM connection is either not defined or is incorrectly defined or TRANSLOGOPTINS DBLOGREADER needs to be added. If your archive files are ONLY under ASM and extract receives an error 500, extract may have run successfully until the process needed to read from the ARCHIVES instead of the REDO. Once it needs to read from archive, the extract will fail.
Please Add the following line, or correct it in your Extract parameter file, if you are On Oracle 11.2.0.2 or better, or 10.2.0.5 or better and using OGG 11.x
TRANLOGOPTIONS DBLOGREADER
If the above version of Oracle or OGG doesn't apply to you specifying a user that can connect to the ASM instance and restart your Extract:
TRANLOGOPTIONS ASMUSER <user>@<ASM_instance_name>,
ASMPASSWORD <password>
1.6.1.3 OGG-00446 Missing filename opening checkpoint file.
ERROR OGG-00446 Missing filename opening checkpoint file.
進程RSJQZ011進程abended,如下:
ERROR OGG-00446 Missing filename opening checkpoint file.
檢查RSJQZ011配置情況:
GGSCI (oraserver.localdomain) 19> view param RSJQZ011
Sourcedefs /goldengate/dirdef/DESJQZ001.def
---handlecollisions
batchsql
SETENV ( NLS_LANG = ".ZHS16GBK")
OBEY /goldengate/dirprm/pwd.obey
Discardfile /goldengate/dirrpt/RSJZX001.dsc, append, megabytes 100
map DB_DJGL.A, target DB_NBGY.A;
發現Replicat RSJQZ011一行被刪除了,所以導致報錯。
加上Replicat RSJQZ011后進程啟動正常。
1.6.2 OGG-01154 Oracle GoldenGate Delivery for Oracle, repn.prm
錯誤信息:
OGG-01154 Oracle GoldenGate Delivery for Oracle, repn.prm: SQL error 1691 mapping DATA_USER.DMH_WJXXB to DATA_USER.DMH_WJXXB OCI Error ORA-01691: unable to extend lob segment DATA_USER.SYS_LOB0000083691C00014$$ by 16384 in tablespace DATA_USER_LOB_U128M_1 (status = 1691), SQL <INSERT INTO "DATA_USER"."DMH_WJXXB" ("DMH_WJXXB_ID","DMH_ZLXXB_ID","DMH_GPXXB_ID","DMH_PCXXB_ID","PICIH","SHENQINGH","FID","WENJIANZL","WENJIANLXDM","WENJIANMC","DTDBBH","FAMINGMC","FUTUGS","WENJIANST>.
處理辦法:
數據庫中該表空間已滿,需要對該表空間進行擴容。
1.6.2.1 OGG-01154
錯誤信息:2011-03-29 15:53:57 WARNING OGG-01154 Oracle GoldenGate Delivery for Oracle, repya.prm: SQL error 14402 mapping EPMA.D_METER to E
PMA.D_METER OCI Error ORA-14402: updating partition key column would cause a partition change (status = 14402), SQL <UPDATE "EPMA"."D_METER" SET "PR_ORG" = :a1,"BELONG_DEPT" = :a2 WHERE "METER_ID" = :b0>.
導致原因:源端更新了分區列,但目標端沒有打開行移動,導致更新時報錯;
處理方法:SQLPLUS>alter table SCHEMA.TABLENAME enable row movement;
1.6.3 OGG-00664
1.6.3.1 OGG-00664 OCI Error during OCIServerAttach (status = 12541-ORA-12541: TNS:no listener).
錯誤信息:
OGG-00664 OCI Error during OCIServerAttach (status = 12541-ORA-12541: TNS:no listener).
處理方法:
啟動數據庫的監聽器。
1.6.3.2 OGG-00664 OCI Error during OCIServerAttach (status = 12545-Error while trying to retrieve text for error ORA-12545).
2015-06-09 22:31:11 ERROR OGG-00664 OCI Error during OCIServerAttach (status = 12545-Error while trying to retrieve text for error ORA-12545).
2015-06-09 22:31:16 ERROR OGG-01668 PROCESS ABENDING.
ORACLE_HOME設置有問題。
解決辦法:setenv (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
1.6.4 OGG-00665
1.6.4.1 OGG-00665 OCI Error describe for query (status = 3135-ORA-03135: connection lost contact
錯誤信息:
OGG-00665 OCI Error describe for query (status = 3135-ORA-03135: connection lost contact
Process ID: 8859
Session ID: 131 Serial number: 31), SQL<SELECT DECODE(archived, 'YES', 1, 0), status FROM v$log WHERE thread# = :ora_thread AND sequence# = :ora_seq_no>.
處理方法:
在沒有關閉OGG進程的情況下,提前關閉了數據庫,導致OGG進程出現異常。如果是發現了這個錯誤提示,應該馬上關閉OGG進程,注意數據庫的歸檔日志情況,保證歸檔日志不會缺失,然后等待數據庫啟動成功后,馬上啟動OGG進程。
1.6.4.2 OGG-00665 OCI Error describe for query
Applies to:
Oracle GoldenGate - Version: 11.1.1.0.7 and later [Release: 11.1.1 and later ]
Information in this document applies to any platform.
When attempting to start an Extract, we get error
2010-12-09 18:59:25 GGS ERROR 182 OCI Error describe for query (bad syntax) (status = 942-ORA-00942: table or view does not exist), SQL< select value$ from sys.props$ where name = 'NLS_LANGUAGE'>.
2010-12-09 18:59:25 GGS ERROR 190 PROCESS ABENDING.
The database user does not have the necessary privilege.
Grant the necessary privilege to the Golden Gate user.
SQL> grant select on sys.props$ to ggsuser;
or
SQL> grant select any dictionary to ggsuser;
1.6.4.3 OGG-00665 OCI Error describe for query (status = 942-ORA-00942: table or view does not exist), SQL<SELECT 1 FROM DUAL WHERE EXISTS ( SELECT 'x' FROM ggusr.GGS_DDL_HIST WHERE OPTIME < '2015-05-25 11:12:43')>.
2015-06-08 12:12:43 ERROR OGG-00665 OCI Error describe for query (status = 942-ORA-00942: table or view does not exist), SQL<SELECT 1 FROM DUAL WHERE EXISTS ( SELECT 'x' FROM ggusr.GGS_DDL_HIST WHERE OPTIME < '2015-05-25 11:12:43')>.
2015-06-08 12:12:43 ERROR OGG-01668 PROCESS ABENDING.
如果想使用DDL功能,需要在之前運行支持DDL的相關腳本。
1.@marker_setup.sql
2.@ddl_setup.sql mode of installation:initialsetup
3.@role_setup.sql
4.GRANT GGS_GGSUSER_ROLE TO gguser
5.@ddl_enable.sql
1.6.5 OGG-01161 Bad column index (4) specified for table QQQ.TIANSHI, max columns = 4.
錯誤信息:
OGG-01161 Bad column index (4) specified for table QQQ.TIANSHI, max columns = 4.
處理方法:
對照一下生產端與容災端的這一張表的表結構,如果容災端的表缺少一列,則在容災端,登陸數據庫,增加這一列,然后啟動復制進程。
1.6.6 OGG-00199 Table QQQ.T0417 does not exist in target database.
錯誤信息:
ERROR OGG-00199 Table QQQ.T0417 does not exist in target database.
處理方法:
查看源端抽取進程的參數,DDL復制參數是否配置,針對這張表,重新實施數據初始化。
1.6.7 OGG-01738 BOUNDED RECOVERY
database version:11.2.0.3 RAC
goldengate version :11.1.1.1.2
早上發現數據同步異常,source端狀態如下:
GGSCI (ulecardrac1) 3> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT232 00:00:00 06:32:33
EXTRACT RUNNING PUMP232 00:00:00 00:00:03
status還是為RUNNING,但是已經有六個半小時沒有update了,其實該進程已經hang住
查看告警日志ggserr.log
發現存在OGG-01738提示
2013-03-07 02:42:28 INFO OGG-01738 Oracle GoldenGate Capture for Oracle, ext232.prm: BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p5905_Redo Thread 1: start=SeqNo: 679, RBA: 83280912, SCN: 1.913813052 (5208780348), Timestamp: 2013-03-06 22:00:20.000000, end=SeqNo: 679, RBA: 129051136, SCN: 1.938808049 (5233775345), Timestamp: 2013-03-07 02:42:03.000000.
2013-03-07 02:42:28 INFO OGG-01738 Oracle GoldenGate Capture for Oracle, ext232.prm: BOUNDED RECOVERY: CHECKPOINT: for object pool 2: p5905_Redo Thread 2: start=SeqNo: 692, RBA: 103611920, SCN: 1.913812238 (5208779534), Timestamp: 2013-03-06 22:00:16.000000, end=SeqNo: 693, RBA: 93604864, SCN: 1.938808100 (5233775396), Timestamp: 2013-03-07 02:42:15.000000.
MOS上有一篇關於該錯誤的文章 note 1293772.1
國內大牛劉相兵的博客上也有一篇關於該錯誤的說明:
http://www.askmaclean.com/archives/ogg-01738-bounded-recovery.html
The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:
GGSCI> start <extract_name> BRRESET
因為extract進程ext232已經假死,無法stop掉,甚至用'send ext232 forcestop'和'stop mgr'也無法stop掉該extract進程
最后只能在shell下kill掉進程,再重新執行
GGSCI> start ext232 BRRESET
重新啟動后,發現狀態已經正常,同步已經基本無延遲。
該bug只在RAC中或者單實例設置了多個thread的情況下出現,而且在更高級版本中已經修復,為了一勞永逸,可以考慮將ogg升級至11.2.1.0.1
2012-10-20 10:28:02 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 79286800, SCN: 0.3712874 (3712874), Timesta
mp: 2012-10-19 22:27:45.000000, Thread: 1, end=SeqNo: 343, RBA: 79287296, SCN: 0.3712874 (3712874), Timestamp: 2012-10-19 22:27:45.000000, Thread: 1.
2012-10-20 14:28:05 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 107000336, SCN: 0.3725744 (3725744), Timest
amp: 2012-10-20 02:27:14.000000, Thread: 1, end=SeqNo: 343, RBA: 107000832, SCN: 0.3725744 (3725744), Timestamp: 2012-10-20 02:27:14.000000, Thread: 1.
2012-10-20 18:28:06 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 128054288, SCN: 0.3739371 (3739371), Timest
amp: 2012-10-20 06:28:02.000000, Thread: 1, end=SeqNo: 343, RBA: 128054784, SCN: 0.3739371 (3739371), Timestamp: 2012-10-20 06:28:02.000000, Thread: 1.
2012-10-20 22:28:06 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 153368080, SCN: 0.3752583 (3752583), Timest
amp: 2012-10-20 10:27:46.000000, Thread: 1, end=SeqNo: 343, RBA: 153368576, SCN: 0.3752583 (3752583), Timestamp: 2012-10-20 10:27:46.000000, Thread: 1.
2012-10-21 02:28:08 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 165712912, SCN: 0.3763760 (3763760), Timest
amp: 2012-10-20 14:28:00.000000, Thread: 1, end=SeqNo: 343, RBA: 165713408, SCN: 0.3763760 (3763760), Timestamp: 2012-10-20 14:28:00.000000, Thread: 1.
2012-10-21 06:28:15 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 179789328, SCN: 0.3774866 (3774866), Timest
...skipping one line
2012-10-21 10:28:16 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 201859088, SCN: 0.3788193 (3788193), Timest
amp: 2012-10-20 22:26:32.000000, Thread: 1, end=SeqNo: 343, RBA: 201859584, SCN: 0.3788193 (3788193), Timestamp: 2012-10-20 22:26:32.000000, Thread: 1.
2012-10-21 14:28:26 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 246480912, SCN: 0.3803284 (3803284), Timest
amp: 2012-10-21 02:27:31.000000, Thread: 1, end=SeqNo: 343, RBA: 246481408, SCN: 0.3803284 (3803284), Timestamp: 2012-10-21 02:27:31.000000, Thread: 1.
2012-10-21 18:28:33 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p17448_extr: start=SeqNo: 343, RBA: 291493392, SCN: 0.3821051 (3821051), Timest
amp: 2012-10-21 06:28:22.000000, Thread: 1, end=SeqNo: 343, RBA: 291493888, SCN: 0.3821051 (3821051), Timestamp: 2012-10-21 06:28:22.000000, Thread: 1.
在Oracle GoldenGate版本11.x中,引入了Bounded Recovery(BR)的概念,即允許extract對於長事務(long running transaction 比BRINTERVAL指定值更長的事務)寫入到本地BR目錄。當extract重啟時,它會首先讀取BR文件,取而代之讀取恢復檢查點指定的歸檔日志,這樣有助於提升性能以及減少對舊歸檔文件的依賴。
但是當在RAC環境中使用Bounded Recovery(BR)特性來恢復一個異常abend掉的extract的話,小概率可能會遇到extract hang住或丟失特性的事務。該BUG僅在RAC環境中或者單實例情況下使用多個thread設置時出現。
1. bug 10368242: transaction loss with BR
When a transaction is committed, it will be flushed to trail file. But when BR writing started (after the transaction commit) and extract abends abnormally, the extract may not have chance to flush the committed transaction to trail. When extract restarted, it will read from BR, and leave that committed transaction as persist committed transaction in memory and never be written to trail. So this committed transaction may be lost.
The problem will not happen when the extract stops in normal mode.
2.bug 12532428 (base bug 10408077 ): extract hung when using BR and new objects are added to extract
With BR setup, when new objects (table, sequence, DDL, et al) are including in the extract, restarted extract will pick up more data that causes the producer queue limit (a fixed number) used by BR be reached. Because the extract is still in BR recovery, the consumer thread is stopped and not processing data from the producer queues. This caused a deadlock, and the extract will appear hung.
解決方案
1. 對於BUG 12532428引起的事務丟失,該BUG在11.1.1.1中被修復,且會在11.1.1.0中被backport。
2. 對於BUG 10408077 引起的extract hang,該BUG在11.1.1.1和 11.1.1.0.30中被修復,也可以如下workaround繞過:
A workaround with earlier 11.1.1.0 version is to start extract with BRRESET, when new object is added to an extract. All the archived logs since recovery checkpoint need to be available.
ggsci> start extract, BRRESET
When running Oracle Golden Gate 11.1.1.0.6 or higher, extract is “abending” every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.
Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.
ERROR
———
2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: ggcrc64valid.
2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: br_validate_bcp.
2011-02-06 05:15:38 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p7186_Redo Thread 1.
2011-02-06 05:15:38 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 0.0.0.
…
2011-02-06 09:15:46 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p7186_Redo Thread 1: start=SeqNo: 21659, RBA: 117520912, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000, end=SeqNo: 21659, RBA: 117602816, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000.
Cause
Under these conditions, this may be a problem with the Bounded Recovery Checkpoint file. It is likely corrupted.
Solution
The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:
GGSCI> start <extract> BRRESET
BOUNDED RECOVERY
錯誤信息:BOUNDED RECOVERY: reset to initial or altered checkpoint.
數據庫問題,不能讀取第2個節點的archivelog文件
1.6.8 OGG-00268 OGG-01668參數文件格式問題
OGG-00268 OGG-01668參數文件格式問題
現象:
Start ext1報錯:
2012-04-23 04:17:21 ERROR OGG-00268 Parameter unterminated.
2012-04-23 04:17:21 ERROR OGG-01668 PROCESS ABENDING.
原因:
GoldenGate對語法要求非常嚴格,比如逗號,分號,空格等
處理方法:
在參數文件最后加一個分號“;”
1.6.9 WARNING OGG-00959 (MINKEEPFILES option not used.).
在mgr.rpt 里面找到這個警告WARNING OGG-00959 (MINKEEPFILES option not used.).
-- Purge old trail-files
刪除老文件
PURGEOLDEXTRACTS /ggs/tdmInput/m1/g3*, USECHECKPOINTS, MINKEEPHOURS 12
2012-10-30 15:15:09 WARNING OGG-00959 PURGEOLDEXTRACTS /ggs/tdmInput/m1/g3*, USECHECKPOINTS, MINKEEPHOURS 12 (MINKEEPFILES option not used.).
The descripton for this warning is:
// *Cause: The PURGEOLDEXTRACTS parameter contains the option MINKEEPHOURS or
// MINKEEPDAYS with the option MINKEEPFILES. These are mutually
// exclusive. If either MINKEEPHOURS or MINKEEPDAYS is used with
// MINKEEPFILES, then MINKEEPHOURS or MINKEEPDAYS is accepted, and
// MINKEEPFILES is ignored.
// *Action: Remove MINKEEPFILES (or MINKEEPHOURS depending on your
// requirements.
告警描述:
原因:PURGEOLDEXTRACTS 參數包含了MINKEEPHOURS 或者MINKEEPDAYS 參數並且包含MINKEEPFILES參數 ,他們之間是相互沖突的。
如果MINKEEPHOURS ,MINKEEPDAYS ,MINKEEPFILES 同時使用那么系統接受MINKEEPHOURS和MINKEEPDAYS 參數將對MINKEEPFILES 參數做忽略。
1.6.10 OGG-00303 Did not recognize parameter argument.
參數變量配置不正確
問題描述:
ERROR OGG-00303 Did not recognize parameter argument. |
問題分析:
進程參數文件配置不正確。
問題處理:
檢查參數配置文件,可能是進程名稱與配置文件不一致或者是參數不正確,重啟進程。
1.6.11 OGG-01044
2015-06-08 17:54:45 ERROR OGG-01044 The trail './dirdat/aa' is not assigned to extract 'EORA_T1'. Assign the trail to the extract with the command "ADD EXTTRAIL/RMTTRAIL ./dirdat/aa, EXTRACT EORA_T1".
解決辦法:需要添加trail文件
GGSCI (orcltest) 11> add exttrail ./dirdat/aa,extract eora_t1,megabytes 100
EXTTRAIL added.
1.6.12 OGG-00396
2015-01-07 11:39:38 ERROR OGG-00396 Command not terminated by semi-colon.
2015-01-07 11:39:38 ERROR OGG-01668 PROCESS ABENDING.
原因是配置文件中沒有以分號結尾;
解決辦法:修改配置文件。
1.6.13 OGG-01031 goldengate源端意外宕機,導致OGG-01031報錯
示例9-21:
ERROR OGG-01031 There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Expected 4 bytes, but got 0 bytes, in trail ./dirdat/t1000026, seqno 26, reading record trailer token at RBA 103637218).
2011-01-06 11:04:16 ERROR OGG-01668 PROCESS ABENDING.
處理方法:
可能是目標端的trail file出問題了,前滾重新生成一個新的SEND EXTRACT xxx ROLLOVER,或者“alter extract xxx rollover”。
服務器宕機,沒有停止dpump進程,啟動后處於abend狀態,檢查ggserr.log報以下錯誤:
2011-04-01 11:13:19 ERROR OGG-01031 Oracle GoldenGate Capture for Oracle, dpump.prm: There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/vistor/media/GG/dirdat/rt000003" (error 11, Resource temporarily unavailable)).
原因是由於目標端的OGG代碼正在更新,同時dpump進程沒有停止導致dpump進程始終尋找老的manager端口和源端的trail文件。
解決方法重新啟動exp、ddump、ext、mananger進程,若還是報錯就需要更改參數。
dpump添加 ETROLLOVER屬性,產生一個新的文件點
alter extract ext1 etrollover
start extract dpump
info extract dpump
標記源端trail文件sequence number開啟生成新的rt文件
send replicat rep1,logend
alter replicat rep1,extseqno 4, extrba 0
start replicat rep1
進程啟動恢復正常。
source端:
GGSCI (orcltest) 31> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EORA_HR 00:00:00 00:00:07
EXTRACT ABENDED PORA_HR 00:00:00 40:04:19
REPLICAT RUNNING RORA_HR2 00:00:00 00:00:00
REPLICAT STOPPED TESTRPT 00:00:00 00:05:48
GGSCI (orcltest) 32> view report PORA_HR
***********************************************************************
Oracle GoldenGate Capture for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:42:16
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
Starting at 2015-06-12 10:36:28
***********************************************************************
Operating System Version:
Linux
Version #1 SMP Sun Nov 10 22:19:54 EST 2013, Release 2.6.32-431.el6.x86_64
Node: orcltest
Machine: x86_64
soft limit hard limit
Address Space Size : unlimited unlimited
Heap Size : unlimited unlimited
File Size : unlimited unlimited
CPU Time : unlimited unlimited
Process id: 14523
Description:
***********************************************************************
** Running with the following parameters **
***********************************************************************
2015-06-12 10:36:28 INFO OGG-03035 Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:.
extract pora_hr
setenv (ORACLE_SID=ogg1)
Set environment variable (ORACLE_SID=ogg1)
setenv (ORACLE_HOME=/u02/app/oracle/product/11.2.0/dbhome_1)
Set environment variable (ORACLE_HOME=/u02/app/oracle/product/11.2.0/dbhome_1)
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
passthru
rmthost 192.168.59.130,mgrport 7809
rmttrail ./dirdat/pa
table hr.*;
2015-06-12 10:36:28 INFO OGG-01815 Virtual Memory Facilities for: COM
anon alloc: mmap(MAP_ANON) anon free: munmap
file alloc: mmap(MAP_SHARED) file free: munmap
target directories:
/u01/gg11/dirtmp.
CACHEMGR virtual memory values (may have been adjusted)
CACHESIZE: 64G
CACHEPAGEOUTSIZE (normal): 8M
PROCESS VM AVAIL FROM OS (min): 128G
CACHESIZEMAX (strict force to disk): 96G
2015-06-12 10:36:33 INFO OGG-01226 Socket buffer size set to 27985 (flush size 27985).
Source Context :
SourceModule : [er.extrout]
SourceID : [/scratch/aime1/adestore/views/aime1_adc4150256/oggcore/OpenSys/src/app/er/extrout.c]
SourceFunction : [complete_tcp_msg]
SourceLine : [1522]
ThreadBacktrace : [9] elements
: [/u01/gg11/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7f5c2f9bd06e]]
: [/u01/gg11/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x2cc) [0x7f5c2f9b944c]]
: [/u01/gg11/libgglog.so(_MSG_ERR_ER_REMOTE_COMM_PROBLEM(CSourceContext*, char const*, CMessageFactory::MessageDisposition)+0x31) [0x7f5c2f9a11e9]]
: [/u01/gg11/extract(complete_tcp_msg(extract_def*)+0x424) [0x51313c]]
: [/u01/gg11/extract(flush_tcp(extract_def*, int)+0x20d) [0x5139f1]]
: [/u01/gg11/extract(RECOVERY_initialize()+0x371) [0x524f91]]
: [/u01/gg11/extract(main+0x4a5) [0x56ca65]]
: [/lib64/libc.so.6(__libc_start_main+0xfd) [0x3a5221ed1d]]
: [/u01/gg11/extract(__gxx_personality_v0+0x38a) [0x4e8b7a]]
2015-06-12 10:36:43 ERROR OGG-01031 There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply rece
ived is Unable to open file "./dirdat/pa000002" (error 11, Resource temporarily unavailable)).
2015-06-12 10:36:43 ERROR OGG-01668 PROCESS ABENDING.
GGSCI (orcltest) 34> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EORA_HR 00:00:00 00:00:05
EXTRACT ABENDED PORA_HR 00:00:00 40:05:10
REPLICAT RUNNING RORA_HR2 00:00:00 00:00:10
REPLICAT STOPPED TESTRPT 00:00:00 00:06:39
GGSCI (orcltest) 35> alter extract pora_hr etrollover
2015-06-12 10:38:15 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically.
EXTRACT altered.
GGSCI (orcltest) 36> view params PORA_HR
extract pora_hr
setenv (ORACLE_SID=ogg1)
setenv (ORACLE_HOME=/u02/app/oracle/product/11.2.0/dbhome_1)
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
passthru
rmthost 192.168.59.130,mgrport 7809
rmttrail ./dirdat/pa
table hr.*;
GGSCI (orcltest) 37> start extract PORA_HR
Sending START request to MANAGER ...
EXTRACT PORA_HR starting
GGSCI (orcltest) 38> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EORA_HR 00:00:00 00:00:06
EXTRACT RUNNING PORA_HR 00:00:00 00:00:49
REPLICAT RUNNING RORA_HR2 00:00:00 00:00:01
REPLICAT STOPPED TESTRPT 00:00:00 00:07:42
target端:
GGSCI (rhel6_lhr) 30> view report RORA_HR
***********************************************************************
Oracle GoldenGate Delivery for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:48:07
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
Starting at 2015-06-10 04:48:15
***********************************************************************
Operating System Version:
Linux
Version #1 SMP Tue Apr 21 08:37:59 PDT 2015, Release 2.6.32-504.16.2.el6.x86_64
Node: rhel6_lhr
Machine: x86_64
soft limit hard limit
Address Space Size : unlimited unlimited
Heap Size : unlimited unlimited
File Size : unlimited unlimited
CPU Time : unlimited unlimited
Process id: 40019
Description:
***********************************************************************
** Running with the following parameters **
***********************************************************************
2015-06-10 04:48:15 INFO OGG-03035 Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:.
replicat rora_hr
setenv (ORACLE_SID=ogg2)
Set environment variable (ORACLE_SID=ogg2)
setenv (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
Set environment variable (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
ddl include all
ddlerror default ignore retryop maxretries 3 retrydelay 5
userid ggusr,password ***
handlecollisions
assumetargetdefs
discardfile ./dirrpt/rora_hr.dsc,purge
map hr.* ,target hr.*;
2015-06-10 04:48:15 INFO OGG-01815 Virtual Memory Facilities for: COM
anon alloc: mmap(MAP_ANON) anon free: munmap
file alloc: mmap(MAP_SHARED) file free: munmap
target directories:
/u01/gg11/dirtmp.
CACHEMGR virtual memory values (may have been adjusted)
CACHESIZE: 2G
CACHEPAGEOUTSIZE (normal): 8M
PROCESS VM AVAIL FROM OS (min): 4G
CACHESIZEMAX (strict force to disk): 3.41G
Database Version:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
Database Language and Character Set:
NLS_LANG = "AMERICAN_AMERICA.ZHS16GBK"
NLS_LANGUAGE = "AMERICAN"
NLS_TERRITORY = "AMERICA"
NLS_CHARACTERSET = "ZHS16GBK"
***********************************************************************
** Run Time Messages **
***********************************************************************
Opened trail file ./dirdat/pa000002 at 2015-06-10 04:48:15
2015-06-10 04:48:19 WARNING OGG-01519 Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe
rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.
2015-06-10 04:48:29 WARNING OGG-01519 Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe
rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.
2015-06-10 04:48:50 WARNING OGG-01519 Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe
rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.
2015-06-10 04:49:30 WARNING OGG-01519 Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe
rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.
2015-06-10 04:50:50 WARNING OGG-01519 Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe
rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.
2015-06-10 04:53:30 WARNING OGG-01519 Waiting at EOF on input trail file ./dirdat/pa000002, which is not marked as complete; but succeeding trail file ./dirdat/pa000003 exists. If ALTER ETROLLOVER has been pe
rformed on source extract, ALTER EXTSEQNO must be performed on each corresponding downstream reader.
2015-06-10 04:54:21 INFO OGG-01021 Command received from GGSCI: STATS.
GGSCI (rhel6_lhr) 31> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EORA_HR2 00:00:00 00:00:04
EXTRACT RUNNING PORA_HR2 00:00:00 00:00:04
REPLICAT RUNNING RORA_HR 00:00:00 00:00:10
GGSCI (rhel6_lhr) 32> send replicat RORA_HR,logend
ERROR: No Command for SEND.
GGSCI (rhel6_lhr) 33> alter replicat RORA_HR,extseqno 3, extrba 0
ERROR: REPLICAT RORA_HR is running and cannot be altered (1,2,No such file or directory).
GGSCI (rhel6_lhr) 34>
GGSCI (rhel6_lhr) 34> stop RORA_HR
Sending STOP request to REPLICAT RORA_HR ...
Request processed.
GGSCI (rhel6_lhr) 35> alter replicat RORA_HR,extseqno 3, extrba 0
REPLICAT altered.
GGSCI (rhel6_lhr) 36> start RORA_HR
Sending START request to MANAGER ...
REPLICAT RORA_HR starting
GGSCI (rhel6_lhr) 37> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EORA_HR2 00:00:00 00:00:08
EXTRACT RUNNING PORA_HR2 00:00:00 00:00:05
REPLICAT RUNNING RORA_HR 00:05:33 00:00:03
GGSCI (rhel6_lhr) 38> view report RORA_HR
***********************************************************************
Oracle GoldenGate Delivery for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:48:07
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
Starting at 2015-06-10 05:01:13
***********************************************************************
Operating System Version:
Linux
Version #1 SMP Tue Apr 21 08:37:59 PDT 2015, Release 2.6.32-504.16.2.el6.x86_64
Node: rhel6_lhr
Machine: x86_64
soft limit hard limit
Address Space Size : unlimited unlimited
Heap Size : unlimited unlimited
File Size : unlimited unlimited
CPU Time : unlimited unlimited
Process id: 40703
Description:
***********************************************************************
** Running with the following parameters **
***********************************************************************
2015-06-10 05:01:13 INFO OGG-03035 Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:.
replicat rora_hr
setenv (ORACLE_SID=ogg2)
Set environment variable (ORACLE_SID=ogg2)
setenv (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
Set environment variable (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
ddl include all
ddlerror default ignore retryop maxretries 3 retrydelay 5
userid ggusr,password ***
handlecollisions
assumetargetdefs
discardfile ./dirrpt/rora_hr.dsc,purge
map hr.* ,target hr.*;
2015-06-10 05:01:13 INFO OGG-01815 Virtual Memory Facilities for: COM
anon alloc: mmap(MAP_ANON) anon free: munmap
file alloc: mmap(MAP_SHARED) file free: munmap
target directories:
/u01/gg11/dirtmp.
CACHEMGR virtual memory values (may have been adjusted)
CACHESIZE: 2G
CACHEPAGEOUTSIZE (normal): 8M
PROCESS VM AVAIL FROM OS (min): 4G
CACHESIZEMAX (strict force to disk): 3.41G
Database Version:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
Database Language and Character Set:
NLS_LANG = "AMERICAN_AMERICA.ZHS16GBK"
NLS_LANGUAGE = "AMERICAN"
NLS_TERRITORY = "AMERICA"
NLS_CHARACTERSET = "ZHS16GBK"
***********************************************************************
** Run Time Messages **
***********************************************************************
Opened trail file ./dirdat/pa000003 at 2015-06-10 05:01:13
2015-06-10 05:01:13 INFO OGG-01020 Processed extract process RESTART_ABEND record at seq 3, rba 1046 (aborted 0 records).
Switching to next trail file ./dirdat/pa000004 at 2015-06-10 05:01:13 due to EOF, with current RBA 1108
Opened trail file ./dirdat/pa000004 at 2015-06-10 05:01:13
Processed extract process graceful restart record at seq 4, rba 1074.
Processed extract process graceful restart record at seq 4, rba 1136.
2015-06-10 05:01:13 INFO OGG-01407 Setting current schema for DDL operation to [hr].
2015-06-10 05:01:13 INFO OGG-01408 Restoring current schema for DDL operation to [ggusr].
GGSCI (rhel6_lhr) 39>
1.6.13.1 OGG-01031
啟動源端傳輸進程DPEND,ggserr.log錯誤顯示如下:
2012-08-28 15:09:39 ERROR OGG-01031 Oracle GoldenGate Capture for Oracle, dpend.prm: There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/uo1/app/ogg/dirdat/nd000004" (error 2, No such file or directory)).
2012-08-28 15:09:41 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, dpend.prm: PROCESS ABENDING.目標端ggserr.log錯誤顯示如下:
2012-08-28 15:06:30 WARNING OGG-01223 Oracle GoldenGate Collector for Oracle: Unable to lock file "/uo1/app/ogg/dirdat/nd000004" (error 11, Resource temporarily unavailable). Lock currently held by process id (PID) 13854.
2012-08-28 15:06:30 WARNING OGG-01223 Oracle GoldenGate Collector for Oracle: Unable to open file "/uo1/app/ogg/dirdat/nd000004" (error 2, No such file or directory).
導致原因:可能是網絡出現過故障,OGG源端的Data Pump進程與目標斷了聯系,目標端mgr為其啟動的server進程一直還在運行,下次data pump重啟時目標mgr會試圖生成另外一個server進程,這樣兩個進程會爭同一個隊列文件。
處理方法:
1、停掉源端的所有data pump,使用ps –ef|grep server(或OGG安裝目錄)看看是不是還有OGG的server進程在跑,如果有,殺死它(一定要確認源端data pump全停掉,並且殺的是server進程,不要殺其它extract/replicat/mgr等),重啟源端data pump即可。
2、可能是目標端的trail file出問題了,前滾重新生成一個新的隊列文件
SEND EXTRACT xxx ETROLLOVER
或者:alter extract xxx etrollover
xxx為datapump的名稱
1.6.14 OGG-01296
示例9-18:
ERROR OGG-01296 Oracle GoldenGate Delivery for Oracle, yx_rep3.prm: Error mapping from SGPM.A_PAY_FLOW to SGPM.A_PAY_FLOW.
由於源端進行了表結構更改,沒有通知目標端,導致此錯誤。
處理方法:在目標端執行相應的語句,將表結構修改為和源端一致。
1.6.15 OGG-01088
錯誤信息:
示例9-19:
ERROR OGG-01088 Oracle GoldenGate Delivery for Oracle, pms_rep1.prm: malloc 2097152 bytes failed.
ERROR OGG-01668 Oracle GoldenGate Delivery for Oracle, pms_rep1.prm: PROCESS ABENDING.
處理方法:
(1)“ulimit –a”,驗證操作系統對用戶是否所有資源都是無限制。
(2)將進程進行拆分,拆分為多個進程。
(3)從support.oracle.com下載最新的補丁包,升級GoldenGate。
1.6.16 OGG-01223
啟動源端傳輸進程DPEND,ggserr.log錯誤顯示如下:
2012-08-17 11:43:50 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, dpend.prm: TCP/IP error 79 (Connection refused).
2012-08-17 11:45:01 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, dpend.prm: TCP/IP error 79 (Connection refused).
導致原因:因為目標端110上MGR進程沒有啟動,導致報錯
處理方法:
在目標端啟動start mgr啟動進程后,再啟動源端的傳輸進程DPEND,錯誤消失,文件順利傳輸過來了。
正常的日志如下:
2012-08-17 14:31:51 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, dpend.prm: EXTRACT DPEND started.
2012-08-17 14:33:13 INFO OGG-01226 Oracle GoldenGate Capture for Oracle, dpend.prm: Socket buffer size set to 27985 (flush size 27985).
2012-08-17 14:33:26 INFO OGG-01052 Oracle GoldenGate Capture for Oracle, dpend.prm: No recovery is required for target file F:\ogg\dirdat\nd000000, at RBA 0 (file not opened).
2012-08-17 14:33:26 INFO OGG-01478 Oracle GoldenGate Capture for Oracle, dpend.prm: Output file F:\ogg\dirdat\nd is using format RELEASE 11.2.
1.6.17 OGG-01224
示例9-20:
ERROR OGG-01224 Oracle GoldenGate Manager for Oracle, mgr.prm: No buffer space available
處理方法:
修改mgr.prm,擴大動態端口范圍,dynamicportlist 7840-7914。
1.6.17.1 OGG-01224
啟動源端傳輸進程DPEND,ggserr.log錯誤顯示如下:
2012-08-22 05:33:10 ERROR OGG-01224 Oracle GoldenGate Capture for Oracle, dpend.prm: TCP/IP error 113 (No route to host).
2012-08-22 05:33:10 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, dpend.prm: PROCESS ABENDING.
導致原因:因為目標端235上的防火牆沒有關閉,導致報錯
處理方法:
在目標端機器關閉防火牆后,再啟動源端的傳輸進程DPEND,錯誤消失,文件順利傳輸過來了。
1.6.18 OGG-01476
ERROR OGG-01476 The previous run abended due to an out of order transaction. Issue ALTER ETROLLOVER to advance the output trail sequence past the current trail sequence number, then restart. Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
在初始化的時候,由於容災端沒有准備就緒,生產端來回進行了很多次的操作,導致生產端抽取混亂,此時在進行RMAN之前,重新啟動抽取,忽略調之前的混亂信息。
處理方法:“alter extract xxx, etrollover”。
1.6.19 OGG-00850
ERROR OGG-00850 Oracle GoldenGate Capture for DB2, extxa.prm: Database instance XP1 has both USEREXIT and LOGRETAIN set to off.
ERROR OGG-01668 Oracle GoldenGate Capture for DB2, extxa.prm: PROCESS ABENDING.
處理方法:
(I)如果是DB2 8.1/8.2,必須將USEREXIT和LOGRETAIN設置為ON。
(2)如果是DB2 9.5,已經使用LOGARCHMETH1和LOGARCHMETH2代替以上兩個參數,通常LOGARCHMETH1為DISK,LOGARCHMETH2為TSM,采用這兩個參數開啟歸檔模式。在DB2 9.5中,USEREXIT可以設置為OFF,但是LOGRETAIN仍需設置為ON。
1.6.20 OGG-01027(長事務)
示例9-25:
WARNING OGG-01027 Long Running Transaction: XID 82.4.242063, Items 0, Extract YX_EXT1, Redo Thread 1, SCN 2379.2132775890 (10219859973074), Redo Seq #5688, Redo RBA 195997712.
可以通過下面的命令尋找更詳細的信息:
示例9-26:
GGSCI> send extract xxx, showtrans [thread n] [count n]
其中,thread n是可選的,表示只查看其中一個節點上的未提交交易;count n也是可選的,表示只顯示n條記錄。
例如查看xxx進程中節點1上最長的10個交易,可以通過下列命令:
示例9-27:
GGSCI> send extract extsz , showtrans thread 1 count 10
記錄XID,通過DBA查找具體的長交易執行的內容:
示例9-28:
GGSCI> SEND EXTRACT xxx, SKIPTRANS <82.4.242063> THREAD <2> //跳過交易
GGSCI>SEND EXTRACT xxx, FORCETRANS <82.4.242063> THREAD <1> //強制認為該交 易已經提交
使用這些命令只會讓GoldenGate進程跳過或者認為該交易已經提交,但並不改變數據庫中的交易,它們依舊存在於數據庫中。因此,強烈建議使用數據庫中提交或者回滾交易而不是使用GoldenGate處理。
1.6.21 OGG-01072
示例9-22:
ERROR OGG-01072 LOBROW_get_next_chunk(LOBROW_row_t *, BOOL, BOOL, BOOL, LOBROW_chunk_header_t *, char *, size_t, BOOL, *) Buffer overflow, needed: 132, alloc 2.
處理方法:
(1)如果版本為11.1.1.0.1 Build 078版本,升級到最新的補丁包。
(2)使用“ulimit –a”查看資源使用限制,調整資源為unlimited。
(3)Extract: DBOPTIONS LOBBUFSIZE <bytes>。
(4)replicat: DBOPTIONS LOBWRITESIZE 1MB。
1.7 用戶不存在
問題描述:
2010-05-02 10:45:20 GGS ERROR 2001 Oracle GoldenGate Delivery for Oracle, rcrmheal.prm: Fatal error executing DDL replication: error [Error code [1918], ORA-01918: user 'KINGSTAR' does not exist, SQL /* GOLDENGATE_DDL_REPLICATION */ alter user kingstar account unlock ], no error handler present. |
問題分析:
根據分析日志可以確定是目標端不存在該用戶導致的故障。
問題處理:
方法1、如果不需要同步該用戶,可以在目標端去掉掉映射該用戶,再重啟進程。
例如去掉:MAP KINGSTAR.*, TARGET CRMKINGSTAR.*;
方法2、在目標端手工創建該用戶,再重啟進程。
1.8 表不存在
問題描述:
2010-05-10 15:02:12 GGS ERROR 101 Oracle GoldenGate Delivery for Oracle, rcrmheal.prm: Table CRMOLAP.TB_FT_OFSTK_CLIENT_BY_DAY does not exist in target database. |
問題分析:
根據分析日志可以確定是目標端不存在該表導致的故障。
問題處理:
方法1、如果不需要同步該表,可以在目標端排除掉該表,再重啟進程。
例如添加:MAPEXCLUDE OLAP.TB_FT_OFSTK_CLIENT_BY_DAY
方法2、在目標端手工創建該表, 異構數據庫還需要重新生成表結構定義文件,再重啟進程。
1.9 數據庫索引失效
問題描述
2010-07-05 14:48:32 GGS WARNING 218 Oracle GoldenGate Delivery for Oracle, rapcaxht.prm: SQL error 1502 mapping AXHT.DOCONTRACT to APCAXHT.DOCONTRACT OCI Error ORA-01502: index 'APCAXHT.PK_SID' or partition of such index is in unusable state (status = 1502), SQL <INSERT INTO "APCAXHT"."DOCONTRACT" ("SID","RIQI","JGID","HT_ID","KH_XM","KH_ID","KH_NUM","CREATEDDATE","MODIFIEDDATE","USERNAME","REALNAME","BS","MEMO1","MEMO2","KH_IDLX","DXJGID","KH_IDTY","CPID") VA>. |
問題分析:
數據庫索引失效引起的故障。
問題處理:
重建這個有問題的索引,再重啟進程,故障排除。
1.10 表結構不一致
問題描述:
2010-05-08 14:50:44 GGS ERROR 218 Oracle GoldenGate Delivery for Oracle, rcrmheal.prm: Error mapping from OLAP.TB_FT_OFSTK_BAL_HIS to CRMOLAP.TB_FT_OFSTK_BAL_HIS. |
問題分析:
出現該問題一般都是由於同步的源和目標表結構不一致,包括表字段和索引。
問題處理:
1、 如果是表字段不一致,需要修改表字段,異構數據庫還需要重新生成表結構定義文件,再重啟進程。
2、 如果是索引不一致,需要重建索引,異構數據庫還需要重新生成表結構定義文件,再重啟進程。
1.11 磁盤空間不足
問題描述:
2010-05-07 04:05:31 GGS ERROR 103 Oracle GoldenGate Collector: Unable to write to file "./dirdat/crm/fl003629" (error 28, No space left on device). 2010-05-07 04:05:31 GGS ERROR 190 PROCESS ABENDING. |
問題分析:
根據分析日志可以確定是磁盤空間不足導致的故障。
問題處理:
划分足夠的磁盤空間,再重啟進程。
1.12 TCP/IP故障
問題描述:
2010-06-25 21:06:04 GGS WARNING 150 Oracle GoldenGate Capture for Oracle, BSAIAXEC.prm: TCP/IP error 10060 (由於連接方在一段時間后沒有正確答復或連接的主機沒有反應,連接嘗試失敗。). |
問題分析:
根據分析日志可以確定是不能連接到遠程主機,包括ip地址或端口號。
問題處理:
需要打通能夠連接到遠程主機IP和端口,再重啟進程。
1.13 數據庫不能連接
問題描述:
2010-05-20 18:25:13 GGS ERROR 182 Oracle GoldenGate Delivery for Oracle, rtasaxta.prm: OCI Error during OCIServerAttach (status = 12154-ORA-12154: TNS:could not resolve the connect identifier specified). |
問題分析:
這種故障是數據庫不能連接導致goldengate進程異常。
問題處理:
需要先解決數據庫異常,再重啟進程。
1.14 表空間不足
問題描述:
2010-02-01 17:19:18 GGS ERROR 103 Discard file (./dirrpt/rep1.dsc) exceeded max bytes (10000000). |
問題分析:
根據錯誤可以看出直接引起GoldenGate進程停止的原因是discard文件被寫滿了,是什么原因造成discard文件被寫滿的呢?從discard文件中我們看到是發生了ORA-01653: unable to extend 錯誤,看到這里我相信大家都知道該怎么處理了吧,我們只要擴展這個aaa.TB_LVY_TEMPINVOIC對象所在的表空間的大小即可。
問題處理:
1、找到相關對象存儲的表空間;
例如:select owner,table_name,tablespace_name from dba_tables
2、執行表空間擴展
例如:ALTER TABLESPACE tbs_03 ADD DATAFILE 'tbs_f04.dbf' SIZE 100K AUTOEXTEND ON NEXT 10K MAXSIZE 100K;
1.15 網絡傳輸問題
問題描述:
2010-06-29 16:22:28 GGS ERROR 112 There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Remote file used is /oradataA/ggtrail/b1000008, reply received is Unable to lock file "/oradataA/ggtrail/b1000008" (error 13, Permission denied). Lock currently held by process id (PID) 3674350). |
問題分析:
問題處理:
方法1、手工去KILL掉相應的鎖進程,再重新啟動進程。
方法2、不需理會,大概2小時后會自動釋放該鎖進程。
方法3、goldengate 10.4.0.76 會解決鎖問題。
1.16 捕獲進程不能為表添加補充日志
問題描述:
2010-07-19 16:20:03 GGS ERROR 2100 Oracle GoldenGate Capture for Oracle, ecrmheal.prm: Could not add TRAN DATA for table, error [ORA-32588: supplemental logging attribute all column exists, SQL ALTER TABLE "AXTECH"."TB_FUND_MATCHING" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS /* GOLDENGATE_DDL_REPLICATION */], error code [32588], operation [ALTER TABLE "AXTECH"."TB_FUND_MATCHING" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS /* GOLDENGATE_DDL_REPLICATION */ (size 113)]. |
問題分析:
因為表已經開啟了補充日志(附加日志),而對表做DDL操作時,參數“DDLOPTIONS ADDTRANDATA”會對表重新開啟補充日志(附加日子),但如果該表超過32個字段,並且該表沒有唯一索引時會出現上面的異常;
問題處理:
方法1、去掉參數“DDLOPTIONS ADDTRANDATA”。
方法2、DELETE TRANDATA 用戶.表
方法3、登錄數據庫執行: ALTER TABLE AXHT.BMBM2002 DROP SUPPLEMENTAL LOG DATA (ALL) COLUMNS
1.17 數據庫補充日志(附加日志)沒有打開
問題描述:
2010-10-14 09:25:50 GGS ERROR 190 Oracle GoldenGate Capture for Oracle, ECRMGGS.prm: No minimum supplemental logging is enabled. This may cause extract process to handle key update incorrectly if key column is not in first row piece. 2010-10-14 09:25:50 GGS ERROR 190 Oracle GoldenGate Capture for Oracle, ECRMGGS.prm: PROCESS ABENDING. |
問題分析:
根據分析日志可以確定是源端oracle補充日志沒有打開導致的故障,如果主鍵或唯一索引是組合的(復合的),就需要為表配置supplemental log,否則就不必,也就是說,如果所有表的主鍵是單列的,那根本就不必去理會它是什么意思,如果更新了主鍵中的部分字段,那supplemental log的作用就是把該記錄其余的組成部分的數據也傳輸到目標機,否則目標機就存在不確定性。
問題處理:
登錄數據庫,使用命令ALTER DATABASE ADD SUPPLEMENTAL LOG DATA打開補充日志。然后重新添加捕獲進程和本地隊列。
1.18 表補充日志(附加日志)沒有打開
問題描述:
2010-10-14 09:30:49 GGS WARNING Z1-078 Oracle GoldenGate Capture for Oracle, ECRMGGS.prm: No valid default archive log destination directory found for thread 1. 2010-10-14 09:30:50 GGS ERROR 500 Oracle GoldenGate Capture for Oracle, ECRMGGS.prm: Found unsupported in-memory undo record in sequence 2, at RBA 39675920, with SCN 0.554993 (554993) ... Minimum supplemental logging must be enabled to prevent data loss. 2010-10-14 09:30:51 GGS ERROR 190 Oracle GoldenGate Capture for Oracle, ECRMGGS.prm: PROCESS ABENDING. |
問題分析:
根據分析日志可以確定是源端oracle補充日志沒有打開導致的故障。
問題處理:
登錄數據庫,使用命令ALTER DATABASE ADD SUPPLEMENTAL LOG DATA打開補充日志。
1.19 DDL復制表沒找到
問題描述:
2010-10-14 13:32:10 GGS ERROR 2008 Oracle GoldenGate Capture for Oracle, ECRMGGS.prm: DDL Replication is enabled but table GGS.GGS_DDL_HIST is not found. Please check DDL installation in the database. 2010-10-14 13:32:10 GGS ERROR 190 Oracle GoldenGate Capture for Oracle, ECRMGGS.prm: PROCESS ABENDING. |
問題分析:
根據分析日志可以確定是DDL復制操作已經打開,但沒有找到安裝復制DDL執行腳本產生的表GGS.GGS_DDL_HIST導致的故障。
問題處理:
因為安裝復制DDL是使用用戶GGDDL,執行腳本后會在該用戶產生跟蹤goldengate運行的表,所以要實現支持DDL操作,在參數文件中登錄數據庫必須使用GGDLL和對應的密碼登錄。例如:USERID GGDDL@CRMDB,PASSWORD GGDDL。
1.20 GoldenGate之update操作節點間不同步
GoldenGate之update操作節點間不同步
故障現象:節點1、節點2進行update操作后,不能實現同步
解決過程:
1、常規巡檢:
檢查進程狀態:正常
GGSCI (gc1) 7> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EORA_1 00:00:00 00:00:04
EXTRACT RUNNING PORA_1 00:00:00 00:00:08
REPLICAT RUNNING RORA_1 00:00:00 00:00:05
重新賦權:
SQL> grant INSERT, UPDATE, DELETE on scott.tcustmer to ogg;
--把需要同步表的DML操作授權給ogg
SQL> grant INSERT, UPDATE, DELETE on scott.tcustord to ogg;
--把需要同步表的DML操作授權給ogg
故障依舊。
2、執行如下操作:
GGSCI (gc1) 8> ADD TRANDATA scott.*
注意:先關閉rora_1進程,再添加,然后重新啟動。
--兩個節點操作:ADD TRANDATA scott.新表,設置后發現ogg會去捕捉新表的日志信息
故障解決,兩節點可以同步update操作。
3、總結:
如果新建的表進行同步,update可能不成功,需要進行以下操作,
這樣ogg才會去捕捉新表的日志信息:
操作指令:ADD TRANDATA scott.new_tab
但要注意:先關閉rora_1進程,再添加,然后重新啟動。
1.21 ERROR: Could not delete DB checkpoint for REPLICAT
GGSCI (mail) 9> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT ABENDED REP1 00:00:00 00:58:10
GGSCI (mail) 10> delete REPLICAT REP1
ERROR: Could not delete DB checkpoint for REPLICAT REP1 (Database login required to delete database checkpoint).
GGSCI (mail) 11> dblogin userid ogg,password oracle
Successfully logged into database.
GGSCI (mail) 12> delete REP1
Deleted REPLICAT REP1.
GGSCI (mail) 13> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
若還是不能刪除,則如下操作:
GGSCI (rhel6_lhr) 23> delete REPLICAT RORA_HR
ERROR: Could not delete DB checkpoint for REPLICAT RORA_HR (Database login required to delete database checkpoint).
GGSCI (rhel6_lhr) 24> dblogin userid ggusr@ogg2, password lhr
Successfully logged into database.
GGSCI (rhel6_lhr) 25> delete RORA_HR
ERROR: Could not delete DB checkpoint for REPLICAT RORA_HR (OCI Error ORA-00942: table or view does not exist (status = 942). Deleting from checkpoint table ggusr.ggschkpt, group 'RORA_HR', key 293545198 (0x117f24ee), SQL <DELETE FROM ggusr.ggschkpt WHERE group_name = 'RORA_HR' AND group_key = 293545198>).
GGSCI (rhel6_lhr) 26> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED RORA_HR 00:00:00 00:13:32
GGSCI (rhel6_lhr) 27> delete REPLICAT RORA_HR
ERROR: Could not delete DB checkpoint for REPLICAT RORA_HR (OCI Error ORA-00942: table or view does not exist (status = 942). Deleting from checkpoint table ggusr.ggschkpt, group 'RORA_HR', key 293545198 (0x117f24ee), SQL <DELETE FROM ggusr.ggschkpt WHERE group_name = 'RORA_HR' AND group_key = 293545198>).
GGSCI (rhel6_lhr) 28> add checkpointtable ggusr.ggschkpt
Successfully created checkpoint table ggusr.ggschkpt.
GGSCI (rhel6_lhr) 29> delete REPLICAT RORA_HR
Deleted REPLICAT RORA_HR.
1.22 GoldenGate OGG-00717 unsupported in-memory undo record
2013-07-08 16:31:48 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, EXT1.prm: Positioning to begin time 2013-7-8 下午04:10:22.
2013-07-08 16:31:48 INFO OGG-01516 Oracle GoldenGate Capture for Oracle, EXT1.prm: Positioned to Sequence 18, RBA 9212432, SCN 0.0, 2013-7-8 下午04:10:22.
2013-07-08 16:31:48 ERROR OGG-00717 Oracle GoldenGate Capture for Oracle, EXT1.prm: Found unsupported in-memory undo record in sequence 18, at RBA 9212432, with SCN 0.1347542 (1347542) ... Minimum supplemental logging must be enabled to prevent data loss.
2013-07-08 16:31:48 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, EXT1.prm: PROCESS ABENDING.
搭建GoldenGate環境過程時,碰到了一個詭異的問題。“ Found unsupported in-memory undo record in sequence 18” 實際上,oracle的最小日志附加模式已經開啟。
經過重啟進程mgr, extract進程,問題解決,若還是不能解決就刪掉重建該進程。
1.23 中文表/中文字段處理
比如有個如下的中文表:
示例9-40:
create table 測試表(
ID NUMBER,
姓名 VARCHAR2(30),
FLAG CHAR(1),
CONSTRAINT PK_TESTD PRIMARY KEY (ID) USING INDEX);
--源端創建MV LOG和MV:
drop materialized view log on "測試表";
create materialized view log on "測試表" with primary key;
drop materialized view mv_cn_table;
create materialized view mv_cn_table refresh fast on commit as select id, 姓名 as en_name,flag from "測試表";
在目標端創建表及view:
示例9-41:
create or replace view v_cn_table as select id,姓名 as en_name,flag from 測 試表;
這里NLS_LANG在GG中,抽取和復制必須設置為和目標字符集一致:
示例9-42:
SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")
Extract相關:
示例9-43:
extract ODISC
SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")
userid custom_src, password custom_src
exttrail D:/GoldenGate/dirdat/ODISoc/oc
TABLE CUSTOM_SRC.MV_CN_TABLE;
Pump相關:
示例9-44:
extract ODIT1P
SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")
PASSTHRU
rmthost localhost, mgrport 7909
rmttrail D:/gg_stg/dirdat/ODIT1op/op
TABLE CUSTOM_SRC.MV_CN_TABLE;
Replicat相關:
示例9-45:
replicat ODIT1A1
SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")
userid odi_staging, password odi_staging
discardfile D:/gg_stg/dirrpt/ODIT1.dsc, purge
ASSUMETARGETDEFS
這里必須指定APPLYNOOPUPDATES參數,否則UPDATE有問題,另外,也要指定KEYCOLS,否則刪除和更新有問題:
示例9-46:
map CUSTOM_SRC.MV_CN_TABLE, TARGET ODI_STAGING.V_CN_TABLE, KEYCOLS (ID);