計划重啟crs,有可能啟不來?


根據mos上的文檔說明,這是一個服務器的bug。需要重啟服務器解決
 

GI/CRS Will Not Start On 2nd Node after a planned reboot (文檔 ID 2013790.1)

 

In this Document



 

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.

SYMPTOMS

After performing a planned reboot the CRS and ASM will not start on second node.

alertdrblade2.log
-----------------------
[ohasd(18392)]CRS-2112:The OLR service started on node drblade2.
2015-05-11 15:48:00.005:
[ohasd(18392)]CRS-1301:Oracle High Availability Service started on node drblade2.
2015-05-11 15:48:00.854:
[ohasd(18392)]CRS-8011:reboot advisory message from host: drblade2, component: cssmonit, with time stamp: L-2015-05-11-11:37:37.483
[ohasd(18392)]CRS-8013:reboot advisory message text: Rebooting after limit 28020 exceeded; disk timeout 27500, network timeout 28020, last heartbeat from CSSD at epoch seconds 1431358629.371, 28111 milliseconds ago based on invariant clock value of 3341653
2015-05-11 15:48:00.885:
[ohasd(18392)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 1 were announced and 0 errors occurred
2015-05-11 15:48:02.167:
[/opt/crs/11.2.0.3/grid/bin/oraagent.bin(18432)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/opt/crs/11.2.0.3/grid/log/drblade2/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-05-11 15:53:31.237:
[ohasd(18392)]CRS-5828:Could not start agent '/opt/crs/11.2.0.3/grid/bin/cssdagent_root'. Details at (:CRSAGF00130:) {0:0:2} in /opt/crs/11.2.0.3/grid/log/drblade2/ohasd/ohasd.log.
2015-05-11 15:53:31.249:
[ohasd(18392)]CRS-5828:Could not start agent '/opt/crs/11.2.0.3/grid/bin/cssdmonitor_root'. Details at (:CRSAGF00130:) {0:0:2} in /opt/crs/11.2.0.3/grid/log/drblade2/ohasd/ohasd.log.
2015-05-11 15:53:34.651:
[gpnpd(20617)]CRS-2328:GPNPD started on node drblade2.
2015-05-11 15:53:38.880:
[ohasd(18392)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2015-05-11 15:59:05.906:
[ohasd(18392)]CRS-5828:Could not start agent '/opt/crs/11.2.0.3/grid/bin/cssdagent_root'. Details at (:CRSAGF00130:) {0:0:2} in /opt/crs/11.2.0.3/grid/log/drblade2/ohasd/ohasd.log.
2015-05-11 16:04:35.966:
[ohasd(18392)]CRS-5828:Could not start agent '/opt/crs/11.2.0.3/grid/bin/cssdagent_root'. Details at (:CRSAGF00130:) {0:0:2} in /opt/crs/11.2.0.3/grid/log/drblade2/ohasd/ohasd.log.
2015-05-11 16:04:35.968:
[ohasd(18392)]CRS-2758:Resource 'ora.cssd' is in an unknown state.
2015-05-11 16:04:36.001:
[ohasd(18392)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2015-05-11 16:04:36.001:
[ohasd(18392)]CRS-2807:Resource 'ora.cluster_interconnect.haip' failed to start automatically.
2015-05-11 16:04:36.001:
[ohasd(18392)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2015-05-11 16:04:36.001:
[ohasd(18392)]CRS-2807:Resource 'ora.cssd' failed to start automatically.
2015-05-11 16:04:36.001:
[ohasd(18392)]CRS-2807:Resource 'ora.ctssd' failed to start automatically.
2015-05-11 16:04:36.001:
[ohasd(18392)]CRS-2807:Resource 'ora.evmd' failed to start automatically.


ohasd.log
-------------
2015-05-11 15:53:31.238: [    AGFW][2566100736] {0:0:2} Created alert : (:CRSAGF00130:) :  Failed to start the agent /opt/crs/11.2.0.3/grid/bin/cssdagent_root            
2015-05-11 15:53:31.238: [    AGFW][2566100736] {0:0:2} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.cssd 1 1] ID 4097:38
2015-05-11 15:53:31.238: [    AGFW][2566100736] {0:0:2} Can not stop the agent: /opt/crs/11.2.0.3/grid/bin/cssdagent_root because pid is not initialized
2015-05-11 15:53:31.238: [   CRSPE][2555594496] {0:0:2} Received reply to the intial check for: ora.cssd 1 1 on drblade2
2015-05-11 15:53:31.238: [   CRSPE][2555594496] {0:0:2} Initial CHECK has failed for ora.cssd 1 1 on drblade2
2015-05-11 15:53:31.238: [    AGFW][2566100736] {0:0:2} Agfw Proxy Server received the message: RESOURCE_DELETE[ora.cssd 1 1] ID 4358:180
2015-05-11 15:53:31.239: [    AGFW][2566100736] {0:0:2} ora.cssd 1 1 marked as deleted.
2015-05-11 15:53:31.239: [    AGFW][2566100736] {0:0:2} Agfw config version set to: 31
2015-05-11 15:53:31.239: [    AGFW][2566100736] {0:0:2} Could not forward message [RESOURCE_DELETE[ora.cssd 1 1] ID 4358:180] to agent. /opt/crs/11.2.0.3/grid/bin/cssdagent_root is not running
2015-05-11 15:53:31.239: [    AGFW][2566100736] {0:0:2} Deleting the resource: ora.cssd 1 1
2015-05-11 15:53:31.239: [    AGFW][2566100736] {0:0:2} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_DELETE[ora.cssd 1 1] ID 4358:180


CAUSE

Following tasks are hung in kernel calls.. Can see below that cssdagent,cssdmonitor stuck in D state.

May 12 13:35:55 drblade2 kernel: INFO: task cssdagent:39921 blocked for more than 120 seconds.
May 12 13:35:55 drblade2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 13:35:55 drblade2 kernel: cssdagent D ffff885f19288860 0 39921 1 0x00000080                                  <<<<<<<<<< D state !

May 12 13:35:55 drblade2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 13:35:55 drblade2 kernel: cssdmonitor     D ffff882f032887a0     0 39929      1 0x00000080                  <<<<<<<<<< D state ! 
 
cssdagent,cssdmonitor stuck in D state.
 

SOLUTION

One would need to engage OS linux team to understand why these tasks are hanging. In this case it appears to have been a bug with the OS and E5 processors. A cold reboot (power off ) fixed it.

Please open a ticket with OS Vendor.
 

REFERENCES

NOTE:1050908.1 - Troubleshoot Grid Infrastructure Startup Issues
NOTE:1368382.1 - Top 5 Grid Infrastructure Startup Issues
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM