log file sync等待超高案例淺析

本文轉載自查看原文 2019-09-08 00:27 352 log file sync/ 數據庫技術(Oracle)

監控工具DPA發現海外一台Oracle數據庫服務器DB Commit Time指標告警，超過紅色告警線（40毫秒左右，黃色告警是10毫秒，紅色告警線是20毫秒），如下截圖所示，生成了對應的時段的AWR報告，發現Top 5 Timed Events里面，log file sync等待事件的平均等待時間為37毫秒，log file parallel write等待事件的平均等待時間為40毫秒

如果對Tanel Poder：:Understanding LGWR, Log File Sync Waits and Commit Performance這篇文章所講述的內容很熟悉的話（經典圖如下），那么通過等待事件log file sync與log file parallel write的Avg Wait(ms）指標： 37ms & 40ms,基本上可以判斷就是redo log所在的磁盤I/O出現了性能問題

然后在用lfsdiag.sql腳本分析一下詳細的統計數據，如下所示：

 
          HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS: 
           
          APPROACH: Look at the wait distribution for log file sync waits 
           
          by looking at "wait_time_milli". Look at the high wait times then 
           
          see if you can correlate those with other related wait events. 
           
             INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT 
           
          ---------- ---------------------------------------- --------------- ---------- 
           
                   1 log file sync                                          1       4363 
           
                   1 log file sync                                          2        835 
           
                   1 log file sync                                          4       1650 
           
                   1 log file sync                                          8       4937 
           
                   1 log file sync                                         16     146252 
           
                   1 log file sync                                         32     606674 
           
                   1 log file sync                                         64     263377 
           
                   1 log file sync                                        128     253254 
           
                   1 log file sync                                        256          2 
           
                   1 log file switch completion                             1        124 
           
                   1 log file switch completion                             2          9 
           
                   1 log file switch completion                             4         19 
           
                   1 log file switch completion                             8         21 
           
                   1 log file switch completion                            16         35 
           
                   1 log file switch completion                            32         97 
           
                   1 log file switch completion                            64        133 
           
                   1 log file switch completion                           128        326 
           
                   1 log file switch completion                           256       1736 
           
                   1 log file switch completion                           512       3042 
           
                   1 log file switch completion                          1024       2020 
           
                   1 log file parallel write                                1          0 
           
                   1 log file parallel write                                2          0 
           
                   1 log file parallel write                                4         80 
           
                   1 log file parallel write                                8       2142 
           
                   1 log file parallel write                               16     170987 
           
                   1 log file parallel write                               32     779205 
           
                   1 log file parallel write                               64     311463 
           
                   1 log file parallel write                              128      79688 
           
                   1 log file parallel write                              256      42763 
           
                   1 log file parallel write                              512      13052 
           
                   1 log file parallel write                             1024      20468 
           
                   1 log file parallel write                             2048      14020 
           
                   1 log file parallel write                             4096        921 
           
                   1 log file parallel write                             8192         96 
           
                   1 log file parallel write                            16384         18 
           
                   1 log file parallel write                            32768         18 
           
                   1 log file parallel write                            65536          8 
           
                   1 log file parallel write                           131072          2 
           
                   1 LGWR wait for redo copy                                1       8516 
           
                   1 LGWR wait for redo copy                                2         20 
           
                   1 LGWR wait for redo copy                                4         19 
           
                   1 LGWR wait for redo copy                                8         20 
           
                   1 LGWR wait for redo copy                               16         11 
           
          ORDERED BY WAIT_TIME_MILLI 
           
             INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT 
           
          ---------- ---------------------------------------- --------------- ---------- 
           
                   1 log file sync                                          1       4363 
           
                   1 log file switch completion                             1        124 
           
                   1 log file parallel write                                1          0 
           
                   1 LGWR wait for redo copy                                1       8516 
           
                   1 log file sync                                          2        835 
           
                   1 log file switch completion                             2          9 
           
                   1 log file parallel write                                2          0 
           
                   1 LGWR wait for redo copy                                2         20 
           
                   1 log file sync                                          4       1650 
           
                   1 log file switch completion                             4         19 
           
                   1 log file parallel write                                4         80 
           
                   1 LGWR wait for redo copy                                4         19 
           
                   1 log file sync                                          8       4937 
           
                   1 log file switch completion                             8         21 
           
                   1 log file parallel write                                8       2142 
           
                   1 LGWR wait for redo copy                                8         20 
           
                   1 log file sync                                         16     146252 
           
                   1 log file switch completion                            16         35 
           
                   1 log file parallel write                               16     170987 
           
                   1 LGWR wait for redo copy                               16         11 
           
                   1 log file sync                                         32     606674 
           
                   1 log file switch completion                            32         97 
           
                   1 log file parallel write                               32     779205 
           
                   1 log file sync                                         64     263377 
           
                   1 log file switch completion                            64        133 
           
                   1 log file parallel write                               64     311463 
           
                   1 log file sync                                        128     253254 
           
                   1 log file switch completion                           128        326 
           
                   1 log file parallel write                              128      79688 
           
                   1 log file sync                                        256          2 
           
                   1 log file switch completion                           256       1736 
           
                   1 log file parallel write                              256      42763 
           
                   1 log file switch completion                           512       3042 
           
                   1 log file parallel write                              512      13052 
           
                   1 log file switch completion                          1024       2020 
           
                   1 log file parallel write                             1024      20468 
           
                   1 log file parallel write                             2048      14020 
           
                   1 log file parallel write                             4096        921 
           
                   1 log file parallel write                             8192         96 
           
                   1 log file parallel write                            16384         18 
           
                   1 log file parallel write                            32768         18 
           
                   1 log file parallel write                            65536          8 
           
                   1 log file parallel write                           131072          2 
           
          REDO WRITE STATS 
           
          "redo write time" in centiseconds (100 per second) 
           
          11.1: "redo write broadcast ack time" in centiseconds (100 per second) 
           
          11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond) 
           
          VERSION              INST_ID NAME                                                     VALUE        MILLISECONDS 
           
          ----------------- ---------- ---------------------------------------- --------------------- ------------------- 
           
          10.2.0.5.0                 1 redo write time                                        9551524        95515240.000 
           
          10.2.0.5.0                 1 redo writer latching time                                   51 
           
          10.2.0.5.0                 1 redo writes                                            1434931 
           
          AWR WORST AVG LOG FILE SYNC SNAPS:

上面數據可以看到，log file sync等待事件數量最多的是32ms這個區間的，log file parallel write等待事件發生最多的也是32ms這個區間的，其實這個值已經遠遠超過7ms，極其不正常。log file parallel write 事件是LGWR進程專屬的等待事件，發生在LGWR將log_buffer中的重做日志信息寫入聯機重做日志文件組的成員文件，LGWR在該事件上等待該寫入過程的完成。該事件等待時間過長，說明日志文件所在磁盤緩慢或存在爭用。log file sync和log file parallel write是相互關聯的。換句話講，假設log file parallel write的時間非常長，那么必定導致log file sync等待時間拉長。如果log file parallel write 等待非常高，那么可能一般是物理磁盤I/O的問題

另外，我們也檢查了一下redo log的切換頻率，如下所示，redolog_sitch_time.sql查看發現redo log切換的次數並不頻繁，生成的歸檔日志的量也並不大。大部分時候一小時切換零次或一次。

然后我們找了一台機器(上述指標正常的服務器)簡單對測試了一下IO的速度，這個方法極其簡單，就是看看生成一個大文件需要多長時間，簡單測試一下I/O性能（沒有考慮cache等，測試采樣也不詳盡），但是對比數據也基本能驗證、反饋磁盤IO存在問題）。

問題服務器：

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 88.271 seconds, 12.2 MB/s

real 1m28.273s

user 0m0.010s

sys 0m0.655s

對比服務器（正常的服務器）：

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 2.48344 seconds, 432 MB/s

real 0m2.485s

user 0m0.004s

sys 0m0.386s

如上對比所示，兩台服務器生成同樣一個大小文件，耗費的時間，I/O性能差別非常大，完全驗證了告警的服務器所在的存儲I/O存在性能問題，但是公司分工非常明確，DBA也不清楚底層存儲出了什么問題，只能將這個問題反饋出來，等待海外負責維護系統和存儲的同事的回復。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Oracle-"log file sync"等待分析完全揭秘log file sync等待事件-轉自itpub Oracle數據庫由dataguard備庫引起的log file sync等待 ORACLE等待事件: log file parallel write 超高逼格Log日志打印 Oracle db file parallel write 和 log file parallel write 等待事件 log file sync 等侍值高的一般通用解決辦法 golang 同步等待所有協程執行完畢sync WaitGroup Check the NativeLink log file control file sequential read 等待事件