使用 SQLNET.EXPIRE_TIME 清除僵死連接


    數據庫連接的客戶端異常斷開后,其占有的相應並沒有被釋放,如從v$session視圖中依舊可以看到對應的session處於inactive,且對應的服務器進程也沒有釋放,導致資源長時間地被占用,對於這種情形開該如何處理呢?SQLNET.EXPIRE_TIME對於這個問題我們提供了解決方案,專門用於清理那些異常斷開的情形,如網絡異常中斷,客戶端異常掉電,異常重啟等。本文描述了設置SQLNET.EXPIRE_TIME參數以及演示死連接以及資源被釋放的情形。

 

1、理解SQLNET.EXPIRE_TIME參數
   Use parameter SQLNET.EXPIRE_TIME to specify a the time interval, in minutes, to send a probe to verify that client/server
   connections are active.
   Setting a value greater than 0 ensures that connections are not left open indefinitely, due to an abnormal client termination.
   If the probe finds a terminated connection, or a connection that is no longer in use, it returns an error, causing the
   server process to exit.
   This parameter is primarily intended for the database server,which typically handles multiple connections at any one time.
   
   通過設定參數為非零值(分鍾)來發送探測包以檢查客戶端的異常斷開。一旦探測包找到了異常的連接將返回錯誤,清除對應的server process
   下面是參數使用的一些限制。(缺省值為0,最小值0,建議值10。SQLNET.EXPIRE_TIME=10)
   Limitations on using this terminated connection detection feature are:
   
      It is not allowed on bequeathed connections.
      Though very small, a probe packet generates additional traffic that may downgrade network performance.
      Depending on which operating system is in use, the server may need to perform additional processing to distinguish
      the connection probing event from other events that occur. This can also result in degraded network performance.

 

2、Dead Connection Detection (DCD)與Inactive Sessions

Dead connections:
   These are previously valid connections with the database but the connection between the client and server processes has
   terminated abnormally.
   Examples of a dead connection:
   - A user reboots/turns-off their machine without logging off or disconnecting from the database.
   - A network problem prevents communication between the client and the server.
   
   In these cases, the shadow process running on the server and the session in the database may not terminate.
   
   Implemented by
         * adding SQLNET.EXPIRE_TIME = <MINUTES> to the sqlnet.ora file
   
   With DCD is enabled, the Server-side process sends a small 10-byte packet to the client process after the duration of
   the time interval specified in minutes by the SQLNET.EXPIRE_TIME parameter.
   
   If the client side connection is still connected and responsive, the client sends a response packet back to the database
   server, resetting the timer..and another packet will be sent when next interval expires (assuming no other activity on
   the connection).
   
   If the client fails to respond to the DCD probe packet
        * the Server side process is marked as a dead connection and
        * PMON performs the clean up of the database processes / resources
        * The client OS processes are terminated
   
   NOTE: SQLNET.RECV_TIMEOUT can be set on the SERVER side sqlnet.ora file. This will set a timeout for the server process
         to wait for data from the client process.

Inactive Sessions:
   These are sessions that remain connected to the database with a status in v$session of INACTIVE.
   Example of an INACTIVE session:
   - A user starts a program/session, then leaves it running and idle for an extended period of time.

 

3、配置SQLNET.EXPIRE_TIME

#對於SQLNET.EXPIRE_TIME的配置,需要修改sqlnet.ora,然后添加SQLNET.EXPIRE_TIME項
[oracle@orasrv admin]$ more sqlnet.ora
sqlnet.expire_time = 1     #僅僅需要配置此項,后面的各項僅僅是為了生成跟蹤日志,可省略
TRACE_LEVEL_SERVER = 16 
TRACE_FILE_SERVER = SERVER
TRACE_DIRECTORY_SERVER= /u01/app/oracle/network/trace 
TRACE_TIMESTAMP_ SERVER = ON 
TRACE_UNIQUE_SERVER = ON
DIAG_ADR_ENABLED=OFF

4、模擬及測試DCD連接

C:\Users\robinson.cheng>sqlplus scott/tiger@ora11g    --->從windows客戶端發起連接

SQL*Plus: Release 11.2.0.1.0 Production on Tue Jun 25 09:57:59 2013

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

---Issued the sql to hold a lock
SQL> update emp set sal=sal*1.1 where deptno=20;   

5 rows updated.

--disabled the network adapter in VM setting
SQL> select * from dual;
select * from dual
       *
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 29522
Session ID: 15 Serial number: 447

--服務器端環境   
SQL> select * from v$version where rownum<2;  
  
BANNER  
--------------------------------------------------------------------------------   
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production  

--在服務器端查看session的情況,SCOTT用戶的session狀態為INACTIVE
SQL> @comm_sess_users;

+----------------------------------------------------+
| User Sessions (All)                                |
+----------------------------------------------------+
Instance     SID Serial ID    Status Oracle User     O/S User O/S PID Session Program              Terminal       Machine
--------- ------ --------- --------- ----------- ------------ ------- -------------------------- ---------- -------------
ora11g        15       447  INACTIVE       SCOTT Robinson.Che   29522 sqlplus.exe                      PC39  TRADESZ\PC39
             125         5  INACTIVE         SYS       oracle    4734 sqlplus@orasrv.com (TNS V1      pts/0    orasrv.com
             139         9    ACTIVE         SYS       oracle   29447 sqlplus@orasrv.com (TNS V1      pts/4    orasrv.com

--Get the spid for user scott by SID
SQL> @my_spid_from_sid
Enter value for input_sid: 15
old   4: AND s.sid = &input_sid
new   4: AND s.sid = 15

   SID    SERIAL# SPID
------ ---------- ------------------------
    15        447 29522

--To find the locked object
SQL> @lock_obj

OBJECT_NAME||''||LOCKED_MODE||''||CTIME||''||C.SID||''||SERIAL#
------------------------------------------------------------------
EMP   3   14   15  447
EMP   3   83   15  447

--The trace file exists
SQL> ho ls -hltr /u01/app/oracle/network/trace/s*29522*
-rw-r----- 1 oracle oinstall 241K Jun 25 09:59 /u01/app/oracle/network/trace/server_29522.trc

--->try to issue another sql. the sql is blocked
SQL> set time on;
10:03:46 SQL> delete scott.emp where deptno=20;  
delete scott.emp where deptno=20
             *
ERROR at line 1:
ORA-01013: user requested cancel of current operation

--Check the server process for scott 
10:04:37 SQL> ho ps -ef | grep 29522 | grep -v grep
oracle   29522     1  0 09:58 ?        00:00:00 oracleora11g (LOCAL=NO)

--Could not reach to client from server.
10:06:51 SQL> ho ping 192.168.7.133
PING 192.168.7.133 (192.168.7.133) 56(84) bytes of data.
From 192.168.7.40 icmp_seq=2 Destination Host Unreachable
From 192.168.7.40 icmp_seq=3 Destination Host Unreachable
From 192.168.7.40 icmp_seq=4 Destination Host Unreachable
From 192.168.7.40 icmp_seq=6 Destination Host Unreachable
From 192.168.7.40 icmp_seq=7 Destination Host Unreachable
From 192.168.7.40 icmp_seq=8 Destination Host Unreachable

--此時總進程數為27個
10:15:08 SQL> select count(*) from v$process;

  COUNT(*)
----------
        27

--從09:58進程啟動開始到10:17:59進程依舊沒有被釋放
10:17:59 SQL> ho ps -ef | grep 29522 | grep -v grep
oracle   29522     1  0 09:58 ?        00:00:00 oracleora11g (LOCAL=NO)

-->At this time the server process was released
10:18:08 SQL> ho ps -ef | grep 29522 | grep -v grep

--進程釋放后此時進程總數變為26個
10:19:45 SQL> select count(*) from v$process;

  COUNT(*)
----------
        26

-->the lock was released
10:19:54 SQL> @lock_obj

no rows selected

--Author : Robinson
--Blog   : http://blog.csdn.net/robinson_0612

--scott用戶的session已經從v$session中被移除
10:20:03 SQL> @comm_sess_users;

+----------------------------------------------------+
| User Sessions (All)                                |
+----------------------------------------------------+

Instance     SID Serial ID    Status    Oracle User     O/S User O/S PID Session Program            Terminal    Machine
--------- ------ --------- --------- -------------- ------------ ------- -------------------------- -------- ----------
ora11g       125         5  INACTIVE            SYS       oracle    4734 sqlplus@orasrv.com (TNS V1    pts/0 orasrv.com
             139         9    ACTIVE            SYS       oracle   29447 sqlplus@orasrv.com (TNS V1    pts/4 orasrv.com

5、查看SQLNET.EXPIRE_TIME是否啟用

#下面對跟蹤日志過濾,可以看到09:58:02:853中提示開啟dead connection detection
[oracle@orasrv trace]$ cat -n server_29522.trc |grep dead
    78  [25-JUN-2013 09:58:02:853] niotns: Enabling dead connection detection (1 min)

#下面的查詢中,在09:58:03 timer被啟動,10:18:26后,連接被徹底關閉(包括server process)    
[oracle@orasrv trace]$ cat -n server_29522.trc |grep timer
   447  [25-JUN-2013 09:58:03:050] nstimstart: starting timer at 25-JUN-2013 09:58:03
   451  [25-JUN-2013 09:58:03:051] nsconbrok: timer created for connection
  4092  [25-JUN-2013 10:18:26:173] nstimarmed: timer is armed, with value 3833

#下面是starting timer的詳細信息  
[oracle@orasrv trace]$ head -451 server_29522.trc | tail -5
[25-JUN-2013 09:58:03:050] nstimstart: starting timer at 25-JUN-2013 09:58:03
[25-JUN-2013 09:58:03:051] nstimset: entry
[25-JUN-2013 09:58:03:051] nstimset: normal exit
[25-JUN-2013 09:58:03:051] nstimstart: normal exit
[25-JUN-2013 09:58:03:051] nsconbrok: timer created for connection 

#下面是timer被清除后的詳細信息nstimclear: normal exit
[oracle@orasrv trace]$ head -4097 server_29522.trc | tail -7
[25-JUN-2013 10:18:26:173] nstimarmed: entry
[25-JUN-2013 10:18:26:173] nstimarmed: timer is armed, with value 3833
[25-JUN-2013 10:18:26:173] nstimarmed: normal exit
[25-JUN-2013 10:18:26:173] nstimclear: entry
[25-JUN-2013 10:18:26:173] nstimclear: normal exit
[25-JUN-2013 10:18:26:173] nttctl: entry
[25-JUN-2013 10:18:26:173] nttctl: entry 

6、小結
a、DCD連接通常指用戶沒有正常斷開連接而重啟客戶端,關機以及網絡問題導致客戶端無法與服務器正常通信所致的連接
b、相對於DCD連接,INACTIVE session則是用戶建立連接之后,尚未執行任何操作或操作已經完成但沒有斷開,等同於與處於idle狀態
c、無論是DCD連接,還是出於idle狀態的INACTIVE session,在v$session視圖呈現的都是INACTIVE狀態
d、對於使用resource_limit及profile配置后用戶session超出idle_time的情形,在v$session視圖呈現sniped狀態
e、當在sqlnet.ora配置文件中設置了SQLNET.EXPIRE_TIME參數為非零值時,僵死連接在EXPIRE_TIME指定的時間后被清除
f、演示中僅僅設定EXPIRE_TIME為1分鍾,而實際的釋放時間接近20分鍾左右,什么原因尚不清楚,有待進一步測試
g、設定SQLNET.EXPIRE_TIME為非零值之后,系統需要產生而外的開銷以及帶來網絡性能的下降
h、對於需要及時釋放OS及DB資源的情形,Oracle建議使用resource_limit 及 profile 限制用戶連接的同時並設定SQLNET.EXPIRE_TIME為非零值
i、Reference: [ID 206007.1] [ID 395505.1] [ID 601605.1] [ID 151972.1]

 

Oracle&nbsp;牛鵬社

 

更多參考

有關Oracle RAC請參考
     使用crs_setperm修改RAC資源的所有者及權限
     使用crs_profile管理RAC資源配置文件
     RAC 數據庫的啟動與關閉
     再說 Oracle RAC services
     Services in Oracle Database 10g
     Migrate datbase from single instance to Oracle RAC
     Oracle RAC 連接到指定實例
     Oracle RAC 負載均衡測試(結合服務器端與客戶端)
     Oracle RAC 服務器端連接負載均衡(Load Balance)
     Oracle RAC 客戶端連接負載均衡(Load Balance)
     ORACLE RAC 下非缺省端口監聽配置(listener.ora tnsnames.ora)
     ORACLE RAC 監聽配置 (listener.ora tnsnames.ora)
     配置 RAC 負載均衡與故障轉移
     CRS-1006 , CRS-0215 故障一例 
     基於Linux (RHEL 5.5) 安裝Oracle 10g RAC
     使用 runcluvfy 校驗Oracle RAC安裝環境

有關Oracle 網絡配置相關基礎以及概念性的問題請參考:
     配置非默認端口的動態服務注冊
     配置sqlnet.ora限制IP訪問Oracle
     Oracle 監聽器日志配置與管理
     設置 Oracle 監聽器密碼(LISTENER)
     配置ORACLE 客戶端連接到數據庫

有關基於用戶管理的備份和備份恢復的概念請參考
     Oracle 冷備份
     Oracle 熱備份
     Oracle 備份恢復概念
     Oracle 實例恢復
     Oracle 基於用戶管理恢復的處理
     SYSTEM 表空間管理及備份恢復
     SYSAUX表空間管理及恢復
     Oracle 基於備份控制文件的恢復(unsing backup controlfile)

有關RMAN的備份恢復與管理請參考
     RMAN 概述及其體系結構
     RMAN 配置、監控與管理
     RMAN 備份詳解
     RMAN 還原與恢復
     RMAN catalog 的創建和使用
     基於catalog 創建RMAN存儲腳本
     基於catalog 的RMAN 備份與恢復
     RMAN 備份路徑困惑
     使用RMAN實現異機備份恢復(WIN平台)
     使用RMAN遷移文件系統數據庫到ASM
     linux 下RMAN備份shell腳本
     使用RMAN遷移數據庫到異機

有關ORACLE體系結構請參考
     Oracle 表空間與數據文件
     Oracle 密碼文件
     Oracle 參數文件
     Oracle 聯機重做日志文件(ONLINE LOG FILE)
     Oracle 控制文件(CONTROLFILE)
     Oracle 歸檔日志
     Oracle 回滾(ROLLBACK)和撤銷(UNDO)
     Oracle 數據庫實例啟動關閉過程
     Oracle 10g SGA 的自動化管理
     Oracle 實例和Oracle數據庫(Oracle體系結構) 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM