ORA-12547: TNS:lost contact導致數據庫無法啟動


墨墨導讀:一個詭異的案例:ORA-12547: TNS:lost contact導致數據庫無法啟動,甚至sqlplus都無法登錄,讓我們一一來解開這個案例的真面目

1. 背景概述

某客戶出現數據庫無法啟動的情況,申請雲和恩墨協助分析和處置。

雲和恩墨工程師快速響應,組織相關人員進行故障診斷分析、指出故障原因,提出解決措施並處置,快速恢復了業務。

以下是詳細的故障分析診斷過程,以及詳細的解決方案描述。


2. 故障分析

2.1. 故障現象

數據庫無法啟動,數據庫監聽狀態異常。

Thu Apr 30 15:40:20 2020NOTE: ASMB terminatingErrors in file /oracle/app/oracle/diag/rdbms/****/****/trace/****_asmb_8258020.trc:ORA-15064: communication failure with ASM instanceORA-03113: end-of-file on communication channelProcess ID:Session ID: 595 Serial number: 9Errors in file /oracle/app/oracle/diag/rdbms/****/****/trace/****_asmb_8258020.trc:ORA-15064: communication failure with ASM instanceORA-03113: end-of-file on communication channelProcess ID:Session ID: 595 Serial number: 9ASMB (ospid: 8258020): terminating the instance due to error 15064Thu Apr 30 15:40:20 2020System state dump requested by (instance=1, osid=8258020 (ASMB)), summary=[abnormal instance termination].System State dumped to trace file /oracle/app/oracle/diag/rdbms/****/****/trace/****_diag_8389092_20200430154020.trcDumping diagnostic data in directory=[cdmp_20200430154020], requested by (instance=1, osid=8258020 (ASMB)), summary=[abnormal instance termination].Instance terminated by ASMB, pid = 8258020

發現數據庫的asm實例也出現異常。

2.2. 故障根源

去分析asm的alert日志

SQL> ALTER DISKGROUP DATA DISMOUNT  /* asm agent *//* {0:0:49022} */Thu Apr 30 15:40:19 2020Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_gmon_7405636.trc:ORA-29746: Cluster Synchronization Service is being shut down.GMON (ospid: 7405636): terminating the instance due to error 29746Thu Apr 30 15:40:20 2020System state dump requested by (instance=1, osid=7405636 (GMON)), summary=[abnormal instance termination].System State dumped to trace file /oracle/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_diag_7406038_20200430154020.trcDumping diagnostic data in directory=[cdmp_20200430154020], requested by (instance=1, osid=7405636 (GMON)), summary=[abnormal instance termination].Instance terminated by GMON, pid = 7405636

ASM實例異常。


Ora.asm的資源是offline狀態。

嘗試關閉has,然后重新啟動has,再看看asm是否可以啟動

#/oracle/grid/bin/crsctl start hasCould not send msg exec /oracle/grid/perl/bin/perl -I/oracle/grid/perl/lib /oracle/grid/bin/crswrapexece.pl /oracle/grid/crs/install/s_crsconfig_***_env.txt /oracle/grid/bin/ohasd.bin "reboot" .Please retry 2020-05-02 11:16:25Changing directory to /oracle/grid/log/***/ohasd

has啟動失敗,查看asm agent日志

2020-05-02 11:18:57.898: [ora.asm][3343]{0:0:2} [clean] InstAgent::stop: connect2 oracleHome /oracle/grid oracleSid +ASM2020-05-02 11:18:57.898: [ora.asm][3343]{0:0:2} [clean] InstConnection::connectInt: server not attached2020-05-02 11:18:57.936: [ora.asm][3343]{0:0:2} [clean] ORA-12547: TNS:lost contact
2020-05-02 11:18:57.936: [ora.asm][3343]{0:0:2} [clean] InstConnection::connectInt (1) Exception OCIException2020-05-02 11:18:57.936: [ora.asm][3343]{0:0:2} [clean] InstConnection:connect:excp OCIException OCI error 125472020-05-02 11:18:57.937: [ora.asm][3343]{0:0:2} [clean] InstConnection::connectInt: server not attached2020-05-02 11:18:57.975: [ora.asm][3343]{0:0:2} [clean] ORA-12547: TNS:lost contact
2020-05-02 11:18:57.975: [ora.asm][3343]{0:0:2} [clean] InstConnection::connectInt (1) Exception OCIException2020-05-02 11:18:57.975: [ora.asm][3343]{0:0:2} [clean] InstAgent::stop: connect2 errcode 125472020-05-02 11:18:57.976: [ora.asm][3343]{0:0:2} [clean] clsnUtils::error Exception type=2 string=ORA-12547: TNS:lost contact
2020-05-02 11:18:57.976: [    AGFW][3343]{0:0:2} sending status msg [ORA-12547: TNS:lost contact] for clean for resource: ora.asm **** 12020-05-02 11:18:57.976: [ora.asm][3343]{0:0:2} [clean] ConnectionPool::removeConnection connection count 12020-05-02 11:18:57.976: [ora.asm][3343]{0:0:2} [clean] ConnectionPool::removeConnection sid  +ASM, InstConnection 11471d302020-05-02 11:18:57.976: [ USRTHRD][3343]{0:0:2} InstConnection::breakCall pConnxn:11471d30  DetachLock:1059c2f0 m_pSvcH:000000002020-05-02 11:18:57.976: [ USRTHRD][3343]{0:0:2} InstConnection:~InstConnection: this 11471d30

Asm啟動過程中報InstConnection:connect:excp OCIException OCI error 12547之后啟動失敗。

我們發現sqlplus / as sysdba登錄也會出現TNS 12547的報錯。

通過truss 去跟蹤sqlplus

發現在讀寫sqlnet.log 時候報錯,懷疑是oracle本身有問題,查詢metalink證實了這個想法:Troubleshooting ORA-12547 TNS: Lost Contact [ID 555565.1]。 然后嘗試去relink。

relink的日志出現ksh: /dev/null: 0403-005 Cannot create the specified file.的報錯。根據IBM官方文章:https://www.ibm.com/support/pages/file-access-permissions-do-not-allow-specified-action


執行chmod 660 /dev/null,sqlplus / as sysdba不再顯示ORA-12547: TNS:lost contact的錯誤。

2.3. 故障處置

嘗試重新啟動has,ASM實例正常啟動,但是數據庫實例無法啟動。


手工啟動數據庫。又出現CRS-5016: Process "/oracle/grid/bin/setasmgidwrap等錯誤

020-05-02 13:25:00.251: [ora.****.db][1800]{0:0:2} [start] InstConnection::connectInt (2) Exception OCIException2020-05-02 13:25:00.251: [ora.****.db][1800]{0:0:2} [start] InstConnection:connect:excp OCIException OCI error 10342020-05-02 13:25:00.251: [ora.****.db][1800]{0:0:2} [start] InstAgent::stop: connect1 errcode 10342020-05-02 13:25:00.251: [ora.****.db][1800]{0:0:2} [start] InstAgent::stop: connect2 oracleHome /oracle/app/oracle/product/11.2.0/dbhome_1 oracleSid ****2020-05-02 13:25:00.251: [ora.****.db][1800]{0:0:2} [start] InstConnection::connectInt: server not attached2020-05-02 13:25:00.319: [ora.****.db][1800]{0:0:2} [start] ORA-01017: invalid username/password; logon denied
2020-05-02 13:25:00.319: [ora.****.db][1800]{0:0:2} [start] InstConnection::connectInt (2) Exception OCIException2020-05-02 13:25:00.319: [ora.****.db][1800]{0:0:2} [start] InstConnection:connect:excp OCIException OCI error 10172020-05-02 13:25:00.319: [ora.****.db][1800]{0:0:2} [start] InstAgent::stop: connect2 errcode 10172020-05-02 13:25:00.319: [ora.****.db][1800]{0:0:2} [start] clsnUtils::error Exception type=2 string=ORA-01017: invalid username/password; logon denied
2020-05-02 13:25:00.319: [    AGFW][1800]{0:0:2} sending status msg [ORA-01017: invalid username/password; logon denied] for start for resource: ora.****.db 1 1

看來問題是ORA-01017: invalid username/password; logon denied

手工執行sqlplus / as sysdba也出現這樣的報錯

解決辦法:

執行sqlplus sys as sysdba登錄。執行startup,手動啟動數據庫


3. 根本解決方案及建議

根本原因是由於/dev/null權限的問題

解決辦法:

chmod 660  /dev/null


墨天輪原文鏈接:https://www.modb.pro/db/26889(復制到瀏覽器中打開或者點擊“閱讀原文”)

推薦閱讀:144頁!分享珍藏已久的數據庫技術年刊

數據和雲

ID:OraNews

如有收獲,請划至底部,點擊“在看”,謝謝!

點擊下圖查看更多 ↓

雲和恩墨大講堂 | 一個分享交流的地方

長按,識別二維碼,加入萬人交流社群

請備注:雲和恩墨大講堂

  點個“在看”

你的喜歡會被看到❤


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM