1.現象:
客戶10.2.0.4 RAC環境,出現大量的library cache lock和cursor: pin S wait on X等待,經分析是由於統計信息收集僵死導致的。
數據庫在8點到9點期間,數據庫兩個節點都存在明顯的cursor: pin S wait on X和library cache lock的等待:
TOP 5 EVENT:
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
cursor: pin S wait on X |
1,573,056 |
30,651 |
19 |
146.2 |
Concurrency |
library cache lock |
31,757 |
7,009 |
221 |
33.4 |
Concurrency |
CPU time |
6,416 |
30.6 |
|||
DFS lock handle |
12,381 |
2,979 |
241 |
14.2 |
Other |
latch: library cache |
1,646 |
1,974 |
1,199 |
9.4 |
Concurrency |
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
cursor: pin S wait on X |
17,586,451 |
342,437 |
19 |
812.1 |
Concurrency |
library cache lock |
63,657 |
30,153 |
474 |
71.5 |
Concurrency |
CPU time |
3,820 |
9.1 |
|||
db file sequential read |
241,761 |
1,094 |
5 |
2.6 |
User I/O |
inactive session |
兩個節點的等待現象基本一致,而節點1上還存在明顯的DFS lock handle等待事件。
通過分析ASH信息,發現library cache lock和cursor: pin S wait on X等待基本上都是6點之后才開始出現:
2.分析原因

SELECT trunc(sample_time, 'hh24') TIME, COUNT(*) FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en WHERE ash.event_id = en.event_id AND sample_time>= to_timestamp('20130703', 'yyyymmdd') AND event_name IN ('cursor: pin S wait on X', 'library cache lock') GROUP BY trunc(sample_time, 'hh24') ORDER BY 1;
而觀察出現cursor: pin S wait on X和library cache lock等待會話的BLOCKING SESSION,發現大部分處於等待的會話被會話37鎖定:

SELECT to_char(sample_time, 'hh24:mi:ss.ff') TIME, session_id sid, decode(en.event_name, 'cursor: pin S wait on X', to_number(substr(ltrim(to_char(p2, '0xxxxxxxxxxxxxxx')), 1, 8), 'xxxxxxxx'), blocking_session) b_sid, en.event_name FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en WHERE ash.event_id = en.event_id(+) AND instance_number = 1 AND sample_time >= to_timestamp('201407140800', 'yyyymmddhh24mi') AND sample_time <= to_timestamp('201407140801', 'yyyymmddhh24mi') ORDER BY 1;

TIME SID B_SID EVENT_NAME ------------------ ---------- ---------- ------------------------------ 08:00:09.569 457 4294967291 latch: library cache 08:00:09.569 52 981 cursor: pin S wait ON X 08:00:09.569 168 73 library cache LOCK 08:00:09.569 1763 4294967291 SQL*Net message FROM client 08:00:09.569 123 4294967295 kksfbc child completion 08:00:09.569 476 4294967291 kst: async disk IO 08:00:09.569 500 4294967292 db file sequential READ 08:00:09.569 327 4294967291 SQL*Net message TO client 08:00:09.569 233 4294967291 db file scattered READ 08:00:09.569 73 73 DFS LOCK handle 08:00:09.569 1889 4294967291 gc CURRENT block 2-way 08:00:09.569 1969 981 cursor: pin S wait ON X 08:00:09.569 857 981 cursor: pin S wait ON X 08:00:09.569 812 168 cursor: pin S wait ON X 08:00:09.569 981 73 library cache LOCK 08:00:19.669 52 981 cursor: pin S wait ON X 08:00:19.669 812 168 cursor: pin S wait ON X 08:00:19.669 857 981 cursor: pin S wait ON X 08:00:19.669 1969 981 cursor: pin S wait ON X 08:00:19.669 233 4294967291 db file scattered READ 08:00:19.669 500 4294967292 db file sequential READ 08:00:19.669 1763 4294967291 db file sequential READ 08:00:19.669 476 4294967291 kst: async disk IO 08:00:19.669 457 4294967291 kksfbc child completion 08:00:19.669 73 73 DFS LOCK handle 08:00:19.669 654 4294967291 SQL*Net message TO client 08:00:19.669 168 73 library cache LOCK 08:00:19.669 981 73 library cache LOCK 08:00:29.769 1969 981 cursor: pin S wait ON X 08:00:29.769 52 981 cursor: pin S wait ON X 08:00:29.769 457 4294967291 kksfbc child completion 08:00:29.769 981 73 library cache LOCK 08:00:29.769 168 73 library cache LOCK 08:00:29.769 500 4294967292 db file sequential READ 08:00:29.769 73 73 DFS LOCK handle 08:00:29.769 654 4294967291 SQL*Net message TO client 08:00:29.769 327 4294967291 SQL*Net message TO client 08:00:29.769 1763 4294967291 db file sequential READ 08:00:29.769 857 981 cursor: pin S wait ON X 08:00:29.769 123 4294967295 latch: library cache 08:00:29.769 196 4294967295 latch: library cache 08:00:29.769 233 4294967295 latch: library cache 08:00:29.769 812 168 cursor: pin S wait ON X 08:00:29.769 476 4294967291 kst: async disk IO 08:00:39.879 476 4294967291 kst: async disk IO 08:00:39.879 658 4294967291 db file sequential READ 08:00:39.879 500 4294967292 db file sequential READ 08:00:39.879 1969 981 cursor: pin S wait ON X 08:00:39.879 857 981 cursor: pin S wait ON X 08:00:39.879 812 168 cursor: pin S wait ON X 08:00:39.879 52 981 cursor: pin S wait ON X 08:00:39.879 981 73 library cache LOCK 08:00:39.879 168 73 library cache LOCK 08:00:39.879 1849 4294967291 SQL*Net message TO client 08:00:39.879 327 4294967291 SQL*Net message TO client 08:00:39.879 235 4294967291 rdbms ipc reply 08:00:39.879 19 4294967291 rdbms ipc reply 08:00:39.879 73 73 DFS LOCK handle 08:00:39.879 123 4294967294 enq: WF - contention 08:00:49.987 476 4294967291 kst: async disk IO 08:00:49.987 500 4294967292 db file sequential READ 08:00:49.987 382 4294967291 gc cr block busy 08:00:49.987 1969 981 cursor: pin S wait ON X 08:00:49.987 857 981 cursor: pin S wait ON X 08:00:49.987 812 168 cursor: pin S wait ON X 08:00:49.987 235 4294967295 kksfbc child completion 08:00:49.987 658 4294967295 latch: library cache 08:00:49.987 19 4294967291 latch: library cache 08:00:49.987 981 73 library cache LOCK 08:00:49.987 168 73 library cache LOCK 08:00:49.987 1889 4294967291 SQL*Net message FROM client 08:00:49.987 73 73 DFS LOCK handle 08:00:49.987 52 981 cursor: pin S wait ON X
可以判斷,導致數據庫實例HANG死的原因是因為SID為73的進程:

SELECT to_char(sample_time, 'hh24:mi:ss'), program, action, event_name FROM wrh$_active_session_history ash, wrh$_event_name en WHERE session_id = 73 AND ash.event_id = en.event_id(+) AND sample_time >= to_date('201307140550', 'yyyymmddhh24mi') AND sample_time <= to_date('201307140601', 'yyyymmddhh24mi') ORDER BY 1;

TO_CHAR( PROGRAM ACTION EVENT_NAME -------- ---------------------------------------- ------------------------------ --------------------------- 05:50:07 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:50:17 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:50:27 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:50:37 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:50:47 oracle@orcldb5 (J002) GATHER_STATS_JOB gc cr block 2-way 05:50:57 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:51:08 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:51:18 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:51:28 oracle@orcldb5 (J002) GATHER_STATS_JOB gc cr multi block request 05:51:38 oracle@orcldb5 (J002) GATHER_STATS_JOB direct path WRITE temp 05:51:48 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:51:58 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:08 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:18 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:52:28 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:52:38 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:49 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:59 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:53:09 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:53:19 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:53:29 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:53:39 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:53:49 oracle@orcldb5 (J002) GATHER_STATS_JOB gc CURRENT GRANT 2-way 05:53:59 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:09 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:20 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:30 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:40 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:50 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:00 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:10 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:55:20 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:30 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:40 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:51 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:01 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:11 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:21 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:31 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:41 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:51 oracle@orcldb5 (J002) GATHER_STATS_JOB gc CURRENT GRANT 2-way 05:57:01 oracle@orcldb5 (J002) GATHER_STATS_JOB library cache pin 05:57:11 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 05:57:21 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 05:57:32 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:57:42 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:57:52 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 05:58:02 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:58:12 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:58:22 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:58:32 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:58:42 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:58:52 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:03 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:13 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:23 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:33 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:43 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:59:53 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 06:00:03 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:13 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 06:00:23 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:34 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:44 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:54 oracle@orcldb5 (J002) GATHER_STATS_JOB
根據MOS文檔,Bug 6011045 – DBMS_STATS causes deadlock between ‘cursor: pin S wait on X’ and ‘library cache lock’ [ID 6011045.8],這個問題是RAC環境中Oracle收集統計信息的BUG,在自動收集數據字典信息時,可能會出現進程HANG死,並導致大量的library cache lock和cursor: pin S wait on X等待,此外還可能出現DFS lock handle以及row cache lock的等待。
這個BUG確認影響版本包括10.2.0.4和10.2.0.5,這個BUG Oracle在10.2.0.5.5 PSU中進行了修復。