最近兩天,一台ORACLE數據庫的作業執行delete_ob_get_epps.sh腳本清理過期備份時,執行下面SQL語句就會被阻塞,在監控工具DPA里面部分截圖如下(圖片分開截斷)
sql 'alter system archive log current';
如上截圖所示,會話ID=650的EVENT為Log archive I/O,被阻塞的會話303在等待事件 enq:WL-contention 關於Log archive I/O的資料如下
Log archive I/O
Used local archiving of online redo logs (for a production database) or standby redo logs (for a standby database). When the archiving process exhausts its I/O buffers because all of them are being used for on-going I/O's, the wait for an available I/O buffer is captured in this system wait event.
Wait Time: Depends on the speed of the disks
Parameters: None
后面在metalink上找到相關資料:ALTER SYSTEM ARCHIVELOG CURRENT hangs on WL-enqueue (文檔 ID 1209896.1),文檔描述這是一個bug,這個生產系統為Oracle Database 10g Release 10.2.0.4.0 - 64bit Production,雖然這官方文檔描述這個版本出現的版本為Oracle Database - Enterprise Edition - Version 10.2.0.5 and later。相信10.2.0.4可能也會存在這個問題, 具體信息如下:
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.2.0.5 and later
Information in this document applies to any platform.
ALTER SYSTEM ARCHIVE LOG CURRENT hangs via SQL*Plus, but also during the RMAN-backup.
Therefor the BACKUP ARCHIVELOG ALL, never completes.
Another symptom is that V$ARCHIVE_LOG.APPLIED is not updated
The root-cause is unpublished bug 6113783 - ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK
The session which is executing the ALTER SYSTEM ARCHIVE LOG CURRENT is waiting for the event :
'enq: WL - contention'
This session holding this enqueue seems to be hanging and therefor blocking the ARCHIVE LOG CURRENT to continue.
Get the blocker with :
SQL> select * from v$lock
where v$lock.type = 'WL'
and v$lock.lmode > 0
and v$lock.block = 1;
The related process is :
SQL> select v$session.machine, v$session.process, v$session.program
from v$session, v$lock
where v$lock.sid = v$session.sid
and v$lock.type = 'WL'
and v$lock.lmode > 0
and v$lock.block = 1;
If the blocker is an archiver process (ARCx) than the issue is related to the unpublished bug 6113783 and is fixed in 11g Release2. (11.2.X)
Some patches exist for 11.1.0.7. Check Patch 6113783
The workaround for 10g is to kill the related archiver process on OS-level.
Unix:
% kill -9 <pid>
The archiver will be restarted automaticly.
如果取消執行歸檔當前日志,那么上面阻塞就會消息,如果再次執行alter system archive log current,就會又出現這個阻塞,具體相關信息如下
SQL> select * from v$lock
2 where v$lock.type = 'WL'
3 and v$lock.lmode > 0
4 and v$lock.block = 1;
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
0000000409D991D8 0000000409D991F8 615 WL -2.115E+09 980630802 5 0 35788 1
SQL> select v$session.machine, v$session.process, v$session.program
2 from v$session, v$lock
3 where v$lock.sid = v$session.sid
4 and v$lock.type = 'WL'
5 and v$lock.lmode > 0
6 and v$lock.block = 1;
MACHINE PROCESS PROGRAM
----------------------------------- ------------ -----------------------------
getlnx01.gfg1.esquel.com 10790 rman@xxx.xxx.xxx.com (TNS V1-V3)
SQL> select sid, program from v$session where sid in (select sid from v$lock where sid=615);
SID PROGRAM
---------- ------------------------------------------------
615 rman@xxx.xxx.xxx.com (TNS V1-V3)
然后我也測試驗證了一下,取消執行歸檔當前日志操作,阻塞立馬消失;執行切換redo log(alter system switch logfile),發現redo log又成功歸檔了。不會出現這個問題。但是比較奇怪的是之前沒有出現這個問題。不清楚什么條件觸發了這個bug。
參考資料:
ALTER SYSTEM ARCHIVELOG CURRENT hangs on WL-enqueue (文檔 ID 1209896.1)