crsctl start crs啟動不了


親愛的用戶,您好:

1、gpnpd 進程 具體會嘗試幾次啟動失敗后,才會不再嘗試重啟,而保持 OFFLINE狀態?

這個由ora.gpnpd資源的RESTART_ATTEMPTS屬性決定。默認為10次。

crsctl stat res ora.gpnpd -p -init | grep RESTART_ATTEMPTS

RESTART_ATTEMPTS=10

2、 gpnpd 進程保持 OFFLINE狀態,在哪里能看到這個 offline 狀態?當時rac1上的crsctl start crs 根本無法啟動,執行crsctl status res -t -init ,也是無法和集群通信報錯的。只能在正常的rac2上才能正常執行 crsctl status res -t -init。

通常 crsctl status res -t -init 可以查看。
如果 crsctl status res -t -init 查看不了,需要查看 問題發生時段的 ohasd進程的trace日志文件(ohasd.trc)來確認。

謝謝

Oracle Support	- 21 days ago		[Notes]

親愛的用戶,您好!

您的更新已經收到,我們會盡快查看!感謝您的耐心等待。

謝謝

ZUPENG_LI@YMTC.COM - 21 days ago [Update from Customer]

您好,

“gpnpd 進程經過多次啟動失敗后,12/28 15:05 后不再嘗試重啟,保持 OFFLINE狀態。
此后,在 12/30 您執行 crsctl 命令時,也因為 gpnpd 保持 OFFLINE 狀態導致ocssd、ASM 無法啟動而失敗。”
<<<<<<
1、gpnpd 進程 具體會嘗試幾次啟動失敗后,才會不再嘗試重啟,而保持 OFFLINE狀態?
2、 gpnpd 進程保持 OFFLINE狀態,在哪里能看到這個 offline 狀態?當時rac1上的crsctl start crs 根本無法啟動,執行crsctl status res -t -init ,也是無法和集群通信報錯的。只能在正常的rac2上才能正常執行 crsctl status res -t -init。

謝謝!

Oracle Support	- 25 days ago		[ODM Answer]

親愛的用戶,您好:

12月28日,主機1的網絡是處於offline狀態。

了解了。

當網絡恢復正常后,此種情況,該如何處理以啟動crs?

這種情況,需要手動把資源拉起來。

crsctl start res ora.gpnpd -init

查看資源狀況
crsctl stat res -t -init

謝謝

Oracle Support	- 25 days ago		[ODM Question]

12月28日,主機1的網絡是處於offline狀態。

“此后,在 12/30 您執行 crsctl 命令時,也因為 gpnpd 保持 OFFLINE 狀態導致
ocssd、ASM 無法啟動而失敗。”
<<<<<<<<
當網絡恢復正常后,此種情況,該如何處理以啟動crs?

ZUPENG_LI@YMTC.COM - 25 days ago [Update from Customer]

您好,

12月28日,主機1的網絡是處於offline狀態。

“此后,在 12/30 您執行 crsctl 命令時,也因為 gpnpd 保持 OFFLINE 狀態導致
ocssd、ASM 無法啟動而失敗。”
<<<<<<<<
當網絡恢復正常后,此種情況,該如何處理以啟動crs?

感謝!

Best Regards

Oracle Support	- 28 days ago		[ODM Action Plan]

-------------------- ACTION PLAN DETAILS BELOW---------------------

親愛的用戶,您好:

感謝您的耐心等待,向您報告調查的進展。

從 gpnpd 的 trace 文件,可以看到,在 12/28 ,gpnpd 進程多次失敗,
報 "no interfaces to filter in net data" 錯誤:

<gpnpd.trc>

2021-12-28 15:05:01.247 : CLSINET:4160576128: no interfaces to filter in net data <<<<<<<

2021-12-28 15:05:01.247 : GPNP:4160576128: clsgpnpd_lCheckIpTypes:
[at clsgpnpd.c:1719] Result: (1) CLSGPNP_ERR. (:GPNPD00120:) clsinet_ProfileGetNetData() failed, crv=1. <<<<<
2021-12-28 15:05:01.248 : GPNP:4160576128: clsgpnpd_term: [at clsgpnpd.c:1180] STOP GPnPD terminating. Closing connections...
2021-12-28 15:05:01.250 : default:4160576128: clsgpnpd_term STOP terminating.
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=gpnpd
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=clsinet
2021-12-28 15:05:01.251 : GPNP:4160576128: [at clsgpnp0.c:1443] Glob "gpnpd" ref dec (1) from "clsinet"
2021-12-28 15:05:01.252 : GPNP:4160576128: [at clsgpnp0.c:1430] Glob "gpnpd" terminated from "gpnpd" <<<<<<

gpnpd 進程經過多次啟動失敗后,12/28 15:05 后不再嘗試重啟,保持 OFFLINE
狀態。

此后,在 12/30 您執行 crsctl 命令時,也因為 gpnpd 保持 OFFLINE 狀態導致
ocssd、ASM 無法啟動而失敗。

綜上所述,懷疑在 12/28 15:05 前后,私網網卡出現了故障。若要了解當時的詳細
情形,麻煩您提供 12/28 15:05 前后的 OSWatcher 信息。如果沒有 當時的數據,
請您和OS管理、網絡管理人員協同,查看當時的網卡、網絡通信等是否出現了問題。

Best Regards, 高 健 Oracle客戶服務-中國數據庫組

Oracle Support	- 28 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = gpnpd.trc

2021-12-28 15:05:01.247 : CLSINET:4160576128: no interfaces to filter in net data

2021-12-28 15:05:01.247 : GPNP:4160576128: clsgpnpd_lCheckIpTypes:
[at clsgpnpd.c:1719] Result: (1) CLSGPNP_ERR. (:GPNPD00120:) clsinet_ProfileGetNetData() failed, crv=1. <<<<<
2021-12-28 15:05:01.248 : GPNP:4160576128: clsgpnpd_term: [at clsgpnpd.c:1180] STOP GPnPD terminating. Closing connections...
2021-12-28 15:05:01.250 : default:4160576128: clsgpnpd_term STOP terminating.
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=gpnpd
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=clsinet
2021-12-28 15:05:01.251 : GPNP:4160576128: [at clsgpnp0.c:1443] Glob "gpnpd" ref dec (1) from "clsinet"
2021-12-28 15:05:01.252 : GPNP:4160576128: [at clsgpnp0.c:1430] Glob "gpnpd" terminated from "gpnpd"

Filename = gpnpd.trc

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = crsd.trc

2021-12-28 13:13:17.563 : AGFW:2628663040: [ INFO] {1:39877:29678} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:8770684
2021-12-28 13:13:17.563 : AGFW:2628663040: [ INFO] {1:39877:29678} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:8770684
2021-12-28 13:13:17.573 :UiServer:1870640896: [ INFO] {1:39877:29678} Done for ctx=0x7fff1c062d40
2021-12-28 13:13:17.573 :UiServer:1870640896: [ INFO] {1:39877:29678} Informing CSS of successful CRS shutdown...
2021-12-28 13:13:17.574 :UiServer:1870640896: [ INFO] {1:39877:29678} Flushing repository write requests...
2021-12-28 13:13:17.574 : CRSD:1870640896: [ INFO] {1:39877:29678} Exiting on request of the Policy Engine...
2021-12-28 13:13:17.574 : CRSD:1870640896: [ INFO] {1:39877:29678} Done. <<<< last line

Filename = crsd.trc

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = alert_+ASM1.log

2021-12-28T13:13:28.533348+08:00
freeing rdom 4
freeing the fusion rht of pdb 4
freeing rdom 3
freeing the fusion rht of pdb 3
freeing rdom 2
freeing the fusion rht of pdb 2
freeing rdom 1
freeing the fusion rht of pdb 1
freeing rdom 0
freeing the fusion rht of pdb 0
2021-12-28T13:13:33.788148+08:00
Instance shutdown complete (OS id: 71392) <<<<<< last line

Filename = alert_+ASM1.log

Oracle Support	- 29 days ago		[ODM Issue Verification]

Verified the issue in the log file as noted below:

LOG FILE

Filename = node#1\alert.log
See the following error:

021-12-30 03:10:08.902 [CRSCTL(120577)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc.
2021-12-30 03:10:15.737 [CRSCTL(120720)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120720.trc.

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = crsctl_120577.trc

Trace file /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.10.1.0.0 Copyright 1996, 2021 Oracle. All rights reserved.
default:4160564992: u_set_comp_error: comptype '103' : error '29' <<<<<<<<<<<
2021-12-30 03:10:02.479 : OCRRAW:4160564992: kgfnInitEnv env=0x7ffffffefef8 flags=0x0

2021-12-30 03:10:02.479 : OCRRAW:4160564992: kgfoCreateCtxExt2 trcflg: 0 [trclvl_in:3] ctx:0x5555562d16b0

2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgxgncin: clsssinit: CLSS init failed with status 3

2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgxgncin: clsssinit: return status 3 (0 SKGXN not av) from CLSS

2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnFindLocalNode01: ORA-29701

2021-12-30 03:10:02.725*:kgfn.c@1381: kgfnFindLocalNode: ORA-29701 nmret=2
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnFindLocalNode: not ok

2021-12-30 03:10:02.725*:kgfn.c@1485: kgfnFindLocalNode: not ok
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnTgtInit: local node not found, free kgfnpds

2021-12-30 03:10:02.725*:kgfn.c@2271: kgfnTgtInit: not found
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnGetBeqData failed init target; inst=(null) flags=0x6000

2021-12-30 03:10:02.725*:kgfn.c@5993: kgfnGetBeqData: kgfnTgtInit failed, inst=NULL flags=0x6000
2021-12-30 03:10:02.729 : CLSNS:4160564992: clsns_SetTraceLevel:trace level set to 1.
2021-12-30 03:10:02.847 : OCRRAW:4160564992: 9607 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS <<<

2021-12-30 03:10:02.851 : OCRRAW:4160564992: 9607 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2021-12-30 03:10:02.885 : OCRRAW:4160564992: 9325 Error 4 opening dom root in 0x555556559a50
......

2021-12-30 03:10:08.902*:kgfn.c@5513: kgfnConnect2: failed to connect
2021-12-30 03:10:08.902 : OCRRAW:4160564992: kgfnConnect2Retry: failed to connect connect after 2 attempts, 151s elapsed

2021-12-30 03:10:08.902 : OCRRAW:4160564992: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup <<<<<<<<<<<

2021-12-30 03:10:08.902*:kgfo.c@1014: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup

2021-12-30 03:10:08.902 : OCRRAW:4160564992: -- trace dump on error exit --

2021-12-30 03:10:08.902 : OCRRAW:4160564992: Error [kgfoAl06] in [kgfokge] at kgfo.c:3180

2021-12-30 03:10:08.902 : OCRRAW:4160564992: ORA-15077: could not locate ASM instance serving a required diskgroup

2021-12-30 03:10:08.902 : OCRRAW:4160564992: Category: 7

2021-12-30 03:10:08.902 : OCRRAW:4160564992: DepInfo: 15077

2021-12-30 03:10:08.902 : OCRRAW:4160564992: -- trace dump end --

OCRASM:4160564992: SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

2021-12-30 03:10:08.902 : OCRASM:4160564992: ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup

2021-12-30 03:10:08.902 : OCRASM:4160564992: proprasmo: kgfoCheckMount returned [7]
2021-12-30 03:10:08.902 : OCRASM:4160564992: proprasmo: The ASM instance is down
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprioo: Failed to open [+DG_CRS_FEFL/p-rac/OCRFILE/registry.255.1078051179]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprioo: No OCR/OLR devices are usable
OCRUTL:4160564992: u_fill_errorbuf: Error Info : [Insufficient quorum to open OCR devices]
default:4160564992: u_set_gbl_comp_error: comptype '107' : error '0'
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprinit: Could not open raw device
2021-12-30 03:10:08.980 : default:4160564992: a_init:7!: Backend init unsuccessful : [26]
2021-12-30 03:10:08.982 : default:4160564992: clsvactversion:4: Retrieving Active Version from local storage.

Filename = crsctl_120577.trc

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = node#1\alert.log

2021-12-29 15:17:26.195 [GIPCD(22619)]CRS-42216: No interfaces are configured on the local node for interface definition bond1(:.)?:20.20.88.0: available interface definitions are [eno1(:.)?:10.131.12.0][bond0(:.)?:10.20.28.0].
2021-12-29 15:17:26.221 [GIPCD(22619)]CRS-42216: No interfaces are configured on the local node for interface definition bond1(:.
)?:20.20.88.0: available interface definitions are [eno1(:.)?:10.131.12.0][bond0(:.)?:10.20.28.0].
2021-12-30 03:10:08.902 [CRSCTL(120577)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc.
2021-12-30 03:10:15.737 [CRSCTL(120720)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120720.trc.
2021-12-30 03:10:22.969 [CRSCTL(120872)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120872.trc.
2021-12-31 14:02:53.207 [CRSCTL(119476)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_119476.trc.
2021-12-31 14:03:00.371 [CRSCTL(119668)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_119668.trc.
2021-12-31 14:03:06.842 [OCRCONFIG(119799)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrconfig_119799.trc.
2021-12-31 14:03:18.172 [OCRDUMP(121314)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrdump_121314.trc.
2021-12-31 14:04:23.398 [CRSCTL(129904)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_129904.trc.
2021-12-31 14:04:30.579 [CRSCTL(130934)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_130934.trc.
2021-12-31 14:04:37.047 [OCRCONFIG(131208)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrconfig_131208.trc.
2021-12-31 14:04:48.418 [OCRDUMP(132888)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrdump_132888.trc.
2021-12-31 15:10:09.557 [CRSCTL(47630)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_47630.trc.

Filename = node#1\alert.log

Oracle Support	- 29 days ago		[Notes]

親愛的用戶,您好:

關於 主機messages中發現的error信息, 目前尚不能確定它和 crs 啟動不了的現象是否有關聯。

此信息與如下文檔的記載有些類似:

Error 'Multipathd: Asm!.Asm_ctl_spec: Failed To Store Path Info' found In /var/log/messages ( Doc ID 1268895.1 )

您可以嘗試上述文檔的方法,看看是否可以使得message 的信息消失。

我將繼續調查 crs 啟動不了的現象,若有進展,會再向您報告。

Best Regards, 高 健 Oracle客戶服務-中國數據庫組


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM