生產上數據庫大量的latch free 導致的CPU資源耗盡的問題的解決


中午的時候,我們生產上的某個數據庫,cpu一直居高不下

通過例如以下的sql語句,我們查看當時數據庫的等待,爭用的情況:

select s.SID,
       s.SERIAL#,
       'kill -9 ' || p.SPID,
       s.MACHINE,
       s.OSUSER,
       s.PROGRAM,
       s.USERNAME,
       s.last_call_et,
       a.SQL_ID,
       s.LOGON_TIME,
       a.SQL_TEXT,
       a.SQL_FULLTEXT,
       w.EVENT,
       a.DISK_READS,
       a.BUFFER_GETS
  from v$process p, v$session s, v$sqlarea a, v$session_wait w
 where p.ADDR = s.PADDR
   and s.SQL_ID = a.sql_id
   and s.sid = w.SID
   and s.STATUS = 'ACTIVE'
 order by s.last_call_et desc;

從event能夠看到,是latch 的爭用導致的原因


通過假設的sql,查看是什么樣的latch

select * from v$session_wait 
where event  like 'latch free';
 

P2就是 這個latch的name。通過v$latchname這個視圖就能夠知道哪個詳細的latch

1:45:55 PM SQL> select * from v$latchname where latch#=164;
 
    LATCH# NAME                                                                   HASH
---------- ---------------------------------------------------------------- ----------
       164 simulator hash latch                                             2233208730


查看latch的歷史情況

2:11:59 PM SQL> select name,gets,misses,sleeps from v$latch where sleeps >0 order by sleeps desc;
 
NAME                                                                   GETS     MISSES     SLEEPS
---------------------------------------------------------------- ---------- ---------- ----------
simulator hash latch                                             4827860212  135426899   10890947
cache buffers chains                                             1619822817 2850976006    4747728
gc element                                                       4660052091   25748270     175073
resmgr:schema config                                               91872524     153968      95708
ges resource hash list                                            174151449    1070556      55459
Real-time plan statistics latch                                    40953155     651496      44527
call allocation                                                     3301878     265908      43501
row cache objects                                                 336300485    4970324      19366


這個simulator hash latch已經是顯著的latch部分

eagle在他的站點上有篇文章講到了關於simulator這個

http://www.eygle.com/archives/2011/11/simulator_lru_latch.html

simulator意為模擬。也就是說當Oracle在內存中進行數據塊處理時。實際上還會在預先分配的Buffer中進行相關信息記錄,如DBA信息,當數據塊被老化之后,下次讀取時。假設請求的數據在Simulator內存中存在,則覺得繼續緩存該數據塊是有意義的,通過監控並模擬統計這些操作,並對計算結果加權運算。就能夠實現對於內存的調整建議。


在模擬過程中。也是通過Latch來實現的,相關的Latch就有 simulator lru latch 、 simulator hash latch等.

就Buffer Cache而言。假設系統中該類爭用嚴重,則能夠考慮關閉db_cache_advice。消除這部分內部操作對於性能的影響。
下面是一個相關BUG。在該Bug中,因為DB_CACHE_ADVICE的開啟導致了嚴重的simulator lru latch的競爭:

Bug 5918642  Heavy latch contention with DB_CACHE_ADVICE on

 This note gives a brief overview of bug 5918642.  
 The content was last updated on: 01-APR-2008
  Click here for details of each of the sections below.

Affects:

Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions < 11.2
Versions confirmed as being affected
Platforms affected Generic (all / most platforms affected)

Fixed:

This issue is fixed in

Symptoms:

Related To:

Description

High simulator lru latch contention can occur when db_cache_advice is
set to ON if there is a large buffer cache.


Workaround:
  Set db_cache_advice to OFF

當然,這個僅僅是治標不治本的做法,這個是顯現的表象的問題。根源的問題還是這個sql語句有問題

當一個數據塊讀入到sga中時,該塊的塊頭(buffer header)會放置在一個hash bucket的鏈表(hash chain)中。該內存結構由一系列cache buffers chains子latch保護(又名hash latch或者cbc latch)。對Buffer cache中的塊,要select或者update、insert,delete等。都得先獲得cache buffers chains子latch,以保證對chain的排他訪問。

若在過程中發生爭用,就會等待latch:cache buffers chains事件。

產生原因: 1. 低效率的SQL語句(主要體如今邏輯讀過高) 在某些環境中,應用程序打開運行同樣的低效率SQL語句的多個並發會話。這些SQL語句都設法得到同樣的數據集,每次運行都帶有高 BUFFER_GETS(邏輯讀取)的SQL語句是基本的原因。

相反,較小的邏輯讀意味着較少的latch get操作,從而降低鎖存器爭用並改善性能。注意v$sql中BUFFER_GETS/EXECUTIONS大的語句。 2.Hot block 當多個會話反復訪問一個或多個由同一個子cache buffers chains鎖存器保護的塊時。熱塊就會產生。

當多個會話爭用cache buffers chains子鎖存器時,就會出現這個等待事件。有時就算調優了SQL,但多個會話同一時候運行此SQL,那怕僅僅是掃描特定少數塊,也是也會出現HOT BLOCK的。

SELECT P935.SEQUENCEID,
       null FA_SEQUENCEID,
       P935.ORDERID,
       P935.ORGORDERID,
       P935.PRODUCTNAME,
       P935.PRODUCTNUM,
       P935.ORDERTIME,
       P935.LASTUPDATETIME,
       P935.ORDERSTATUS,
       P935.MEMO,
       935 orderCode,
       P935.PAYERACCTCODE,
       P935.PAYERACCTTYPE,
       P935.PAYEEACCTCODE PLATACCTCODE,
       P935.PAYEEACCTTYPE PLATACCTTYPE,
       P936.PAYEEACCTCODE,
       P936.PAYEEACCTTYPE,
       EXT935.PAYER_DISPLAYNAME,
       EXT935.PAYER_NAME,
       EXT935.PAYER_IDC,
       EXT935.PAYER_MEMBERTYPE,
       EXT936.PAYER_DISPLAYNAME PLAT_DISPLAYNAME,
       EXT936.SUBMITNAME PLAT_NAME,
       EXT936.PAYER_IDC PLAT_IDC,
       EXT936.PAYER_MEMBERTYPE PLAT_MEMBERTYPE,
       EXT936.PAYEE_DISPLAYNAME,
       EXT936.PAYEE_NAME,
       EXT936.PAYEE_IDC,
       EXT936.PAYEE_MEMBERTYPE,
       P935.PAYEEDISPLAYNAME WEBSITENAME,
       CASE
         WHEN (SELECT count(*)
                 FROM PAYMENTORDER P936
                WHERE P936.Ordercode = 936
                  and P936.Orderstatus = 0
                  AND <span style="color:#ff0000;">P936.Relatedsequenceid = P935.SEQUENCEID</span>) > 0 THEN
          0
         ELSE
          1
       END AS SHARINGRESULT,
       CASE D935.Dealcode
         WHEN 210 then
          14
         else
          D935.DEALTYPE
       end PAYMETHOD,
       D935.DEALAMOUNT,
       G935.EXT1,
       G935.Ext2,
       G935.PAYERCONTACTTYPE,
       G935.PAYERCONTACT,
       NVL(D935.PAYEEFEE, 0) PAYEEFEE,
       NVL(D935.PAYERFEE, 0) PAYERFEE,
       nvl(MS936.PAYEEFEE, 0) PLATFORMFEE,
       P935.VERSION
  FROM PAYMENTORDER          P935,
       PAYMENTORDER          P936,
       DEAL                  D935,
       GATEWAYORDER          G935,
       MSGATEWAYSHARINGORDER MS936,
       PAYMENTORDEREXT       EXT935,
       PAYMENTORDEREXT       EXT936
 WHERE P936.ORDERCODE = 936
   AND P935.ORDERCODE = 935
   AND P936.RELATEDSEQUENCEID = to_char(P935.SEQUENCEID)
   AND P935.SEQUENCEID = G935.SEQUENCEID(+)
   AND P935.SEQUENCEID = D935.ORDERSEQID(+)
   AND P935.SEQUENCEID = EXT935.ORDERSEQID(+)
   AND P936.SEQUENCEID = EXT936.ORDERSEQID(+)
   AND P936.SEQUENCEID = MS936.SEQUENCEID(+)
   AND MS936.SHARINGTYPE = 1
   AND P935.SEQUENCEID = :1
UNION
SELECT P938.SEQUENCEID,
       P935.SEQUENCEID FA_SEQUENCEID,
       P938.ORDERID,
       P938.ORGORDERID,
       P935.PRODUCTNAME,
       P935.PRODUCTNUM,
       P938.ORDERTIME,
       P938.LASTUPDATETIME,
       P938.ORDERSTATUS,
       P938.MEMO,
       938 orderCode,
       P938.PAYERACCTCODE,
       P938.PAYERACCTTYPE,
       P938.PAYEEACCTCODE PLATACCTCODE,
       P938.PAYEEACCTTYPE PLATACCTTYPE,
       P938.PAYEEACCTCODE,
       P938.PAYEEACCTTYPE,
       EXT938.PAYER_DISPLAYNAME,
       EXT938.PAYER_NAME,
       EXT938.PAYER_IDC,
       EXT938.PAYER_MEMBERTYPE,
       EXT938.PAYEE_DISPLAYNAME PLAT_DISPLAYNAME,
       EXT938.SUBMITNAME PLAT_NAME,
       EXT938.PAYEE_IDC PLAT_IDC,
       EXT938.PAYEE_MEMBERTYPE PLAT_MEMBERTYPE,
       EXT938.PAYEE_DISPLAYNAME,
       EXT938.PAYEE_NAME,
       EXT938.PAYEE_IDC,
       EXT938.PAYEE_MEMBERTYPE,
       P935.PAYEEDISPLAYNAME WEBSITENAME,
       null SHARINGRESULT,
       D938.DEALTYPE PAYMETHOD,
       D938.DEALAMOUNT,
       G935.EXT1,
       G935.Ext2,
       G935.PAYERCONTACTTYPE,
       G935.PAYERCONTACT,
       NVL(D938.PAYEEFEE, 0) PAYEEFEE,
       NVL(D938.PAYERFEE, 0) PAYERFEE,
       0 PLATFORMFEE,
       P935.VERSION
  FROM PAYMENTORDER    P935,
       PAYMENTORDER    P938,
       DEAL            D938,
       GATEWAYORDER    G935,
       PAYMENTORDEREXT EXT938
 WHERE P935.ORDERCODE = 935
   AND P938.ORDERCODE = 938
   AND P938.RELATEDSEQUENCEID = to_char(P935.SEQUENCEID)
   AND P935.SEQUENCEID = G935.SEQUENCEID(+)
   AND P938.SEQUENCEID = D938.ORDERSEQID(+)
   AND P938.SEQUENCEID = EXT938.ORDERSEQID(+)
   AND P935.SEQUENCEID = :2

分析上面的sql,上面標紅的地方。等號左邊是varchar2的數據類型,括號右邊是number的數據類型。會導致數據類型的隱式轉換,造成極大的性能影響

聯系研發。改動了sql語句,問題解決


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM