Oracle systemstate dump介紹

本文轉載自查看原文 2016-03-02 23:16 4614 systemstate dump/ 數據庫技術(Oracle)

當數據庫出現嚴重的性能問題或者hang起的時候，那么我們非常需要通過systemstate dump來知道進程在做什么，在等待什么，誰是資源的持有者，誰阻塞了別人。在出現上述問題時，及時收集systemstate dump非常有助於問題原因的分析。一般Oracle Support工程是也是需要你提供systemstate dump生成的trace文件做分析，關於systemstate dump的資料，其實沒有非常詳細的官方介紹資料，都是一些零零散散的介紹。

當數據庫出現嚴重性能問題或hang起的時候，服務器端sqlplus連接數據庫要么非常慢，要么根本無法連接。ORACLE 10g 開始，sqlplus提供了這么一個功能參數-prelim,在sqlplus無法連接的情況下，連接登錄到數據庫。下面關於這些知識點的一個總結

There are two ways to connect to sqlplus using a preliminary connection.

 
          sqlplus -prelim / as sysdba 
           
          sqlplus /nolog 
           
          set _prelim on 
           
          connect / as sysdba

用sysdba登錄到數據庫上：

$sqlplus / as sysdba

或者

$sqlplus -prelim / as sysdba <==當數據庫已經很慢或者hang到無法連接

下面是在metalink上的介紹如何在單機或RAC環境下做Systemstate或Hanganalyze（詳細信息，請見下面參考資料）

Collection commands for Hanganalyze and Systemstate: Non-RAC:
Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".

Hanganalyze
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit

Systemstate
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
oradebug dump systemstate 266
oradebug tracefile_name
exit

Collection commands for Hanganalyze and Systemstate: RAC
There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly unadvisable to use these level

For information on these patches see:
Document 11800959.8 Bug 11800959 - A SYSTEMSTATE dump with level >= 10 in RAC dumps huge BUSY GLOBAL CACHE ELEMENTS - can hang/crash instances
Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance

Note: both bugs are fixed in 11.2.0.3.

Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088
For 11g:
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 266
oradebug -g all dump systemstate 266
exit
Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit
For 10g, run oradebug setmypid instead of oradebug setorapname reco:
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit
In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.

那么我們現在來看一個例子吧：

 
          [oracle@DB-Server ~]$ sqlplus -prelim / as sysdba 
           
          SQL*Plus: Release 10.2.0.5.0 - Production on Wed Mar 2 16:31:03 2016 
           
          Copyright (c) 1982, 2010, Oracle.  All Rights Reserved. 
           
          SQL> oradebug setmypid 
           
          Statement processed. 
           
          SQL> oradebug unlimit 
           
          Statement processed. 
           
          SQL> oradebug dump systemstate 266 
           
          Statement processed. 
           
          SQL> oradebug dump systemstate 266 
           
          Statement processed. 
           
          SQL> oradebug tracefile_name 
           
          /u01/app/oracle/admin/SCM2/udump/scm2_ora_13598.trc 
           
          SQL> exit 
           
          Disconnected from ORACLE

告警日志里面會看到類似這樣的信息：

Wed Mar 02 16:32:08 CST 2016

System State dumped to trace file

Wed Mar 02 16:32:48 CST 2016

System State dumped to trace file /u01/app/oracle/admin/xxx/udump/scm2_ora_13598.trc

$ORACLE_BASE/admin/ORACLE_SID/udump/ 下找到對應的trc文件，如下所示，你會看到大量系統中所有進程的進程狀態等信息。每個進程對應跟蹤文件中的一段內容，反映該進程的狀態信息，包括進程信息，會話信息，enqueues信息（主要是lock的信息）等等。

systemstate dump有多個級別：

2: dump (不包括lock element)

10: dump

11: dump + global cache of RAC

256: short stack （函數堆棧）

258: 256+2 -->short stack +dump(不包括lock element)

266: 256+10 -->short stack+ dump

267: 256+11 -->short stack+ dump + global cache of RAC

level 11和 267會 dump global cache, 會生成較大的trace 文件，一般情況下不推薦。一般情況下，如果進程不是太多，推薦用266，因為這樣可以dump出來進程的函數堆棧，可以用來分析進程在執行什么操作。但是生成short stack比較耗時，如果進程非常多，比如2000個進程，那么可能耗時30分鍾以上。這種情況下，可以生成level 10 或者 level 258， level 258 比 level 10會多收集short short stack, 但比level 10少收集一些lock element data.

使用systemstate dump生成的trace文件可能會非常大，一般都會幾百兆甚至更大，雖然通過system state dump收集了進程的相關，但是如何有效的解讀相關信息，並診斷問題是一個不小的難題和挑戰！

參考資料：

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=352993211736965&parent=DOCUMENT&sourceId=68738.1&id=452358.1&_afrWindowMode=0&_adf.ctrl-state=z7hwh19s9_319

https://blogs.oracle.com/Database4CN/entry/systemstate_dump_%E4%BB%8B%E7%BB%8D

http://tech.e2sn.com/oracle/troubleshooting/hang/how-to-log-on-even-when-sysdba-can-t-do-so

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 SQL Server Dump介紹 Oracle dump函數的用法 Oracle導入dump文件 oracle 導出dump文件 oracle dump的使用心得 Oracle導入、導出dump文件 oracle dump數據庫【Oracle】Oracle中dump函數的用法抓取Dump文件的方法和工具介紹通過Oracle DUMP 文件獲取表的創建語句