log file sync等待超高案例淺析


監控工具DPA發現海外一台Oracle數據庫服務器DB Commit Time指標告警,超過紅色告警線(40毫秒左右,黃色告警是10毫秒,紅色告警線是20毫秒),如下截圖所示,生成了對應的時段的AWR報告,發現Top 5 Timed Events里面,log file sync等待事件的平均等待時間為37毫秒,log file parallel write等待事件的平均等待時間為40毫秒

 

clip_image001

 

clip_image002

 

 

如果對Tanel Poder::Understanding LGWR, Log File Sync Waits and Commit Performance這篇文章所講述的內容很熟悉的話(經典圖如下),那么通過等待事件log file sync與log file parallel write的Avg Wait(ms)指標: 37ms & 40ms,基本上可以判斷就是redo log所在的磁盤I/O出現了性能問題

 

 

clip_image003

 

 

然后在用lfsdiag.sql腳本分析一下詳細的統計數據,如下所示:

 

clip_image004

 

 

 

HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
 
APPROACH: Look at the wait distribution for log file sync waits
by looking at "wait_time_milli". Look at the high wait times then
see if you can correlate those with other related wait events.
 
   INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
         1 log file sync                                          1       4363
         1 log file sync                                          2        835
         1 log file sync                                          4       1650
         1 log file sync                                          8       4937
         1 log file sync                                         16     146252
         1 log file sync                                         32     606674
         1 log file sync                                         64     263377
         1 log file sync                                        128     253254
         1 log file sync                                        256          2
         1 log file switch completion                             1        124
         1 log file switch completion                             2          9
         1 log file switch completion                             4         19
         1 log file switch completion                             8         21
         1 log file switch completion                            16         35
         1 log file switch completion                            32         97
         1 log file switch completion                            64        133
         1 log file switch completion                           128        326
         1 log file switch completion                           256       1736
         1 log file switch completion                           512       3042
         1 log file switch completion                          1024       2020
         1 log file parallel write                                1          0
         1 log file parallel write                                2          0
         1 log file parallel write                                4         80
         1 log file parallel write                                8       2142
         1 log file parallel write                               16     170987
         1 log file parallel write                               32     779205
         1 log file parallel write                               64     311463
         1 log file parallel write                              128      79688
         1 log file parallel write                              256      42763
         1 log file parallel write                              512      13052
         1 log file parallel write                             1024      20468
         1 log file parallel write                             2048      14020
         1 log file parallel write                             4096        921
         1 log file parallel write                             8192         96
         1 log file parallel write                            16384         18
         1 log file parallel write                            32768         18
         1 log file parallel write                            65536          8
         1 log file parallel write                           131072          2
         1 LGWR wait for redo copy                                1       8516
         1 LGWR wait for redo copy                                2         20
         1 LGWR wait for redo copy                                4         19
         1 LGWR wait for redo copy                                8         20
         1 LGWR wait for redo copy                               16         11
 
ORDERED BY WAIT_TIME_MILLI
 
   INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
         1 log file sync                                          1       4363
         1 log file switch completion                             1        124
         1 log file parallel write                                1          0
         1 LGWR wait for redo copy                                1       8516
         1 log file sync                                          2        835
         1 log file switch completion                             2          9
         1 log file parallel write                                2          0
         1 LGWR wait for redo copy                                2         20
         1 log file sync                                          4       1650
         1 log file switch completion                             4         19
         1 log file parallel write                                4         80
         1 LGWR wait for redo copy                                4         19
         1 log file sync                                          8       4937
         1 log file switch completion                             8         21
         1 log file parallel write                                8       2142
         1 LGWR wait for redo copy                                8         20
         1 log file sync                                         16     146252
         1 log file switch completion                            16         35
         1 log file parallel write                               16     170987
         1 LGWR wait for redo copy                               16         11
         1 log file sync                                         32     606674
         1 log file switch completion                            32         97
         1 log file parallel write                               32     779205
         1 log file sync                                         64     263377
         1 log file switch completion                            64        133
         1 log file parallel write                               64     311463
         1 log file sync                                        128     253254
         1 log file switch completion                           128        326
         1 log file parallel write                              128      79688
         1 log file sync                                        256          2
         1 log file switch completion                           256       1736
         1 log file parallel write                              256      42763
         1 log file switch completion                           512       3042
         1 log file parallel write                              512      13052
         1 log file switch completion                          1024       2020
         1 log file parallel write                             1024      20468
         1 log file parallel write                             2048      14020
         1 log file parallel write                             4096        921
         1 log file parallel write                             8192         96
         1 log file parallel write                            16384         18
         1 log file parallel write                            32768         18
         1 log file parallel write                            65536          8
         1 log file parallel write                           131072          2
 
REDO WRITE STATS
 
"redo write time" in centiseconds (100 per second)
11.1: "redo write broadcast ack time" in centiseconds (100 per second)
11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
 
VERSION              INST_ID NAME                                                     VALUE        MILLISECONDS
----------------- ---------- ---------------------------------------- --------------------- -------------------
10.2.0.5.0                 1 redo write time                                        9551524        95515240.000
10.2.0.5.0                 1 redo writer latching time                                   51
10.2.0.5.0                 1 redo writes                                            1434931
 
AWR WORST AVG LOG FILE SYNC SNAPS:

 

上面數據可以看到,log file sync等待事件數量最多的是32ms這個區間的,log file parallel write等待事件發生最多的也是32ms這個區間的,其實這個值已經遠遠超過7ms,極其不正常。log file parallel write 事件是LGWR進程專屬的等待事件,發生在LGWR將log_buffer中的重做日志信息寫入聯機重做日志文件組的成員文件,LGWR在該事件上等待該寫入過程的完成。該事件等待時間過長,說明日志文件所在磁盤緩慢或存在爭用。log file synclog file parallel write是相互關聯的。換句話講,假設log file parallel write的時間非常長,那么必定導致log file sync等待時間拉長。如果log file parallel write 等待非常高,那么可能一般是物理磁盤I/O的問題

 

另外,我們也檢查了一下redo log的切換頻率,如下所示,redolog_sitch_time.sql查看發現redo log切換的次數並不頻繁,生成的歸檔日志的量也並不大。大部分時候一小時切換零次或一次。

 

image

 

 

然后我們找了一台機器(上述指標正常的服務器)簡單對測試了一下IO的速度,這個方法極其簡單,就是看看生成一個大文件需要多長時間,簡單測試一下I/O性能(沒有考慮cache等,測試采樣也不詳盡),但是對比數據也基本能驗證、反饋磁盤IO存在問題)。

 

 

 

問題服務器:

 

 

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 88.271 seconds, 12.2 MB/s

 

real    1m28.273s

user    0m0.010s

sys     0m0.655s

 

 

 

對比服務器(正常的服務器):

 

 

 

 

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 2.48344 seconds, 432 MB/s

 

real    0m2.485s

user    0m0.004s

sys     0m0.386s

 

 

  如上對比所示,兩台服務器生成同樣一個大小文件,耗費的時間,I/O性能差別非常大,完全驗證了告警的服務器所在的存儲I/O存在性能問題,但是公司分工非常明確,DBA也不清楚底層存儲出了什么問題,只能將這個問題反饋出來,等待海外負責維護系統和存儲的同事的回復。

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM