一、現象描述
使用ssh登錄到 Oracle 數據庫
[root@localhost ~]# su - oracle [oracle@localhost ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Thu Oct 29 10:38:47 2020 Copyright (c) 1982, 2013, Oracle. All rights reserved. ERROR: ORA-00020: maximum number of processes (1000) exceeded Enter user-name: ERROR: ORA-01017: invalid username/password; logon denied
查看 Oracle的錯誤日志:
[oracle@localhost ~]$ cd ${ORACLE_BASE}/diag/rdbms/ora11g/ora11g/trace/ [oracle@localhost trace]$ pwd /data/oracle/diag/rdbms/ora11g/ora11g/trace [oracle@localhost trace]$ tail -n 2000 alert_ora11g.log |more ............... .............. *********************************************************************** Fatal NI connect error 12514, connecting to: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.247)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ORA11G)(CID=(PROGRAM=oracle)(H OST=localhost.localdomain)(USER=root)))) VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 28-OCT-2020 15:05:04 Tracing not turned on. Tns error struct: ns main err code: 12564 TNS-12564: TNS:connection refused ns secondary err code: 0 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 Wed Oct 28 15:06:01 2020 *********************************************************************** Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 28-OCT-2020 15:06:01 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12560 nt main err code: 505 TNS-00505: Operation timed out nt secondary err code: 110 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.61)(PORT=42817)) Wed Oct 28 15:14:26 2020 *********************************************************************** Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 28-OCT-2020 15:14:26 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12606 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.105)(PORT=49354)) Wed Oct 28 16:20:02 2020 Thread 1 advanced to log sequence 40970 (LGWR switch) Current log# 2 seq# 40970 mem# 0: /data/oracle/data/ora11g/redo02.log Wed Oct 28 17:51:47 2020 Thread 1 advanced to log sequence 40971 (LGWR switch) Current log# 3 seq# 40971 mem# 0: /data/oracle/data/ora11g/redo03.log Wed Oct 28 19:22:52 2020 Thread 1 advanced to log sequence 40972 (LGWR switch) Current log# 1 seq# 40972 mem# 0: /data/oracle/data/ora11g/redo01.log Wed Oct 28 20:48:09 2020 Thread 1 advanced to log sequence 40973 (LGWR switch) Current log# 2 seq# 40973 mem# 0: /data/oracle/data/ora11g/redo02.log Wed Oct 28 21:17:33 2020 ORA-00020: maximum number of processes (1000) exceeded ORA-20 errors will not be written to the alert log for the next minute. Please look at trace files to see all the ORA-20 errors. Process m000 submission failed with error = 20 Wed Oct 28 21:22:33 2020 ORA-00020: maximum number of processes (1000) exceeded ORA-20 errors will not be written to the alert log for the next minute. Please look at trace files to see all the ORA-20 errors. Process m000 submission failed with error = 20 Wed Oct 28 21:27:33 2020 ORA-00020: maximum number of processes (1000) exceeded ORA-20 errors will not be written to the alert log for the next minute. Please look at trace files to see all the ORA-20 errors. Process m000 submission failed with error = 20 Wed Oct 28 21:37:33 2020 ORA-00020: maximum number of processes (1000) exceeded ORA-20 errors will not be written to the alert log for the next minute. Please look at trace files to see all the ORA-20 errors. Process m000 submission failed with error = 20 Process m001 submission failed with error = 20 ...................
通過 alert_ora11g.log 可以看出:
當前數據庫通過dblink連接192.168.1.247數據庫,由於192.168.1.247數據庫故障造成無法連接。所以產生大量的連接數,最終報錯“ORA-00020: maximum number of processes (1000) exceeded”
此時所有的pl/sql、toad等客戶端、web站點都無法連接到該oracle數據庫。
二、解決辦法
查看連接數情況:
[root@localhost ~]# ps -ef|grep ora|wc -l 1015 [root@localhost ~]# ps -ef|grep LOCAL=NO|wc -l 976
在root用戶下殺掉帶 LOCAL=NO 的進程
[root@localhost ~]# kill -9 `ps -ef|grep LOCAL=NO|grep -v grep|awk '{print $2}'` [root@localhost ~]# ps -ef|grep LOCAL=NO root 24863 24695 0 10:38 pts/0 00:00:00 grep LOCAL=NO 或者 ps -ef|grep LOCAL=NO|grep -v grep|awk '{print $2}'|head|xargs kill -9
此時,就可以恢復連接了。
網上有歸檔滿了,報錯ORA-00020
使用 kill 命令殺掉 LOCAL=NO 的進程之后。
登陸后台數據庫進一步觀察會話使用情況
SQL> select event,program,count(*) from v$session group by event,program order by 3; EVENT PROGRAM COUNT(*) ---------------------------------------- ------------------------------ ---------- DIAG idle wait oracle@lf1 (DIAG) 1 class slave wait oracle@lf1 (GCR0) 1 buffer busy waits oracle@lf1 (J004) 1 library cache lock OMS 59
從結果輸出發現大量的會話處於library cache lock等待事件中,接下來 定位阻塞者是哪個會話
SQL> select event,p1raw,p2 from v$session where event ='library cache lock'; EVENT P1RAW P2 -------------------- ---------------- ---------- library cache lock 000000007C384160 1737585656 library cache lock 000000007C384160 1737462296 library cache lock 000000007C384160 1737338936 library cache lock 000000007C384160 1737215576 library cache lock 000000007C384160 1737092216 library cache lock 000000007C384160 1736968856 。。。。
SQL> select event from v$session where saddr=( select kgllkses from x$kgllk where KGLLKhdl ='000000007C384160' and KGLLKMOD>0 ); EVENT ---------------------------------------------------------------- log file switch (archiving needed)
至此已經看到了導致ORA-00020錯誤的源頭,歸檔滿了,刪除多余日志問題解決!