log file sync 等侍值高的一般通用解決辦法

本文轉載自查看原文 2016-07-01 17:03 2121

log file sync等待時間發生在redo log從log buffer寫入到log file期間。

下面對log file sync做個詳細的解釋。

何時發生日志寫入：

1.commit或者rollback

2.每3秒

3.log buffer 1/3滿或者已經有1M的redo數據。

更精確的解釋：_LOG_IO_SIZE 大小默認是LOG_BUFFER的1/3,當log buffer中redo數據達到_LOG_IO_SIZE 大小時，發生日志寫入。

4.DBWR寫之前

_log_io_size隱含參數：

LOG_BUFFER（bytes）寫入的數量超過_LOG_IO_SIZE會觸發lgwr寫日志的條件,缺省值為LOG BUFFER的1/3或1M。

但是這個說法通過查詢並不能驗證，隱含參數盡量不要修改。

col name for a25

col VALUE for a20

col DESCRIB for a50

SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ

FROM SYS.x$ksppi x, SYS.x$ksppcv y

WHERE x.inst_id = USERENV ('Instance')

AND y.inst_id = USERENV ('Instance')

AND x.indx = y.indx

AND x.ksppinm LIKE '_log_io_size';

NAME VALUE DESCRIB

------------------------- -------------------- --------------------------------------------------

_log_io_size 0 automatically initiate log write if this many redo

blocks in buffer

log file sync發生的過程：

此等待事件用戶發出提交或回滾聲明后,等待提交完成的事件,提交命令會去做日志同步,也就是寫日志緩存到日志文件，在提交命令未完成前,用戶將會看見此等待事件.

注意,它專指因提交,回滾而造成的寫緩存到日志文件的等待.當發生此等待事件時,有時也會伴隨log file parallel write.因為此等待事件將會寫日志緩存，如果日志的I/O系統較為緩慢的話,

這必將造成log file parallel write 等待.當發生log file sync等待后,判斷是否由於緩慢的日志I/O造成的,可以查看兩個等待事件的等待時間,如果比較接近,就證明日志I/O比較緩慢或重做日志過多,這時,造成log file sync的原因是因為log file parallel write,可以參考解決log file parallel write的方法解決問題,

**如果log file sync的等待時間很高,而log file parallel write的等待時間並不高,這意味着log file sync的原因並不是緩慢的日志I/O,而是應用程序過多的提交造成。

當log file sync的等待時間和 log file parallel write等待時間基本相同，說明是IO問題造成的log file sync等待事件。

-----

更好理解的解釋：

回顧一下單機數據庫中的'log file sync' 等待事件，當user session 提交（commit）時，user session會通知LGWR進程將redo buffer中的信息寫入到redo log file，當LGWR進程完成寫操作后，LGWR再post（通知）user session 寫操作已經完成，user session 接收到LGWR的通知后提交操作才完成。因此user session 在沒有收到LGWR post（通知）之前一致處於等待狀態，具體的等待事件為'log file sync'。

-----

引起log file sync的原因：

1.頻繁提交或者rollback,檢查應用是否有過多的短小的事務，如果有，可以使用批處理來緩解。

2.OS的IO緩慢：解決辦法是將日志文件放裸設備上或綁定在RAID 0或RAID 1+0中，而不是綁定在RAID 5中。

3.過大的日志緩沖區（log_buffer ）

過大的log_buffer,允許LGWR變得懶惰，因為log buffer中的數據量無法達不到_LOG_IO_SIZE，導致更多的重做條目堆積在日志緩沖區中。

當事務提交或者3s醒來時，LGWR才會把所有數據都寫入到redo log file中。

由於數據很多，LGWR要用更多時間等待redo寫完畢。

這種情況，可以調小參數_LOG_IO_SIZE參數，其默認值是LOG_BUFFER的1/3或1MB，取兩者之中較小的值。

換句話說，你可以具有較大的日志緩沖區，但較小的_LOG_IO_SIZE將增加后台寫入次數，從而減少log file sync的等待時間。

4.CPU負載高。詳見下面的描述。

5.RAC私有網絡性能差，導致LMS同步commit SCN慢。

如何診斷log file sync：

1.AWR：發生log file sync時，先做個snapshot，然后做AWR，AWR時間選擇在10-30分鍾。

已發生的log file sync，那么通過AWR依然可以分析，也要保持在10-30分鍾。

2.Lgwr trace file（10.2.0.4開始），大於500ms會寫入

trace文件中如果有Warning: log write time 1000ms, size 2KB，很有可能IO慢。

3.分析CPU資源使用情況的工具，CPU過於繁忙，lgwr無法及時獲取CPU調度，出現log file sync。

vmstat，關注r是否大於CPU核數，大於說明cpu繁忙。

OSW:OSWatcher,同上。

4.Alert：確認log file 15到20分鍾切換一次

5.Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql) [Document 1064487.1]

解決辦法：

1.如果確實是因為頻繁提交造成的log file sync,那么減少commit。

2.如果確實是因為io引起的，那么解決辦法是將日志文件放裸設備上或綁定在RAID 1+0中，而不是放在在RAID 5中（切記，redo log file一定不要放在SSD上！！！）。

3.確保CPU資源充足。CPU資源不足，LGWR通知user session后，user session無法及時獲得CPU調度，不能正常工作。

4.是否有些表可以使用nologging，會減少redo產生量

5.檢查redo log file足夠大，確保redo log file每15到20分鍾切換一次。

更深入分析log file sync：

如果上面的分析沒有解決log file sync等待事件，那么需要做下面的分析。

The log file sync wait may be broken down into the following components:

log file sync 能拆解為一下步驟：

1. Wakeup LGWR if idle 1.喚醒LGWR進程

2. LGWR gathers the redo to be written and issue the I/O 2.LGWR進程收集redo，然后發給I/O

3. Time for the log write I/O to complete 3.等待log寫入I/O完成

4. LGWR I/O post processing 4.LGWR I/O post processing

5. LGWR posting the foreground/user session that the write has completed 5.LGWR通知前台/用戶回話，redo寫入完成

6. Foreground/user session wakeup 6.前台/用戶會話喚醒

Steps 2 and 3 are accumulated in the "redo write time" statistic. (i.e. as found under STATISICS section of Statspack and AWR)

步驟2和3消耗的時間在AWR中的"redo write time"中有所體現。(AWR中 Instance Activity Stats )

Step 3 is the "log file parallel write" wait event. (Document:34583.1 "log file parallel write" Reference Note)

步驟3產生"log file parallel write"等待事件。

另外：如果是最大保護模式的DATAGUARD(SYNC傳輸),這一步驟還包含網絡寫、RFS/redo寫入到備庫的standby log file sync的時間。

Steps 5 and 6 may become very significant as the system load increases. This is because even after the foreground has been posted it may take a some time for the OS to schedule it to run. May require monitoring from O/S level.

在系統負載高時（尤其是CPU高的情況，看vmstat r值），步驟5和6會變得非常明顯。因為，前台收到LGWR寫入完成的通知后，操作系統需要消耗一些時間調度Foreground/user session進程喚醒（也就是CPU調度）。需要系統級別監控。

幾個技術指標：

log file sync 等待時間小於20ms算正常

log file parallel write 等待時間小於20ms算正常

log file parallel wirte 和log file sync等待時間很接近，說明就是IO問題，因為大部分時間都花在了log寫入到磁盤上。

相關腳本：

--等待時間平均等待時間

   
   
   
           
    
    
    
            select EVENT,TOTAL_WAITS,TOTAL_TIMEOUTS,TIME_WAITED,AVERAGE_WAIT
    
    
    
             from   v$system_event 
    
    
    
             where  event in ('log file sync','log file parallel write'); 
    
    
    
             
    
    
    
            select value from v$parameter where name = 'log_buffer';

---------------新特性：log file sync 兩種方式--------------

Adaptive Log File Sync

Adaptive Log File sync was introduced in 11.2. The parameter controlling this feature, _use_adaptive_log_file_sync, is set to false by default in 11.2.0.1 and 11.2.0.2.

_use_adaptive_log_file_sync參數在11gR2提出。11.2.0.1和11.2.0.2兩個版本該參數默認是false。

從11.2.0.3開始，這個參數默認值是true，也就是開始啟用“自適應日志同步機制”。

11.2.0.1和11.2.0.2也可以開啟改參數

ALTER SYSTEM SET "_use_adaptive_log_file_sync"= scope=;

開啟改參數后，日志同步機制會在2種方式中切換。

該參數決定了，foreground/user session 和LGWR進程通過什么方式獲知commit操作已完成（也就是redo寫log file完成）。

Post/wait, traditional method for posting completion of writes to redo log

傳統方式，在11.2.0.3之前，user session等待LGWR通知redo寫入到log file完畢，被動方式。

優點：post/wait方式，user session幾乎能立即發現redo已刷到磁盤。

Polling, a new method where the foreground process checks if the LGWR has completed the write.

新方式，主動監測LGWR是否完成寫入，主動方式。這種方式比Post/wait方式響應速度慢，但是可以節約CPU資源。

優點：當commit完成后，LGWR會把commit完成的消息通知給很多user session，這個過程消耗大量CPU。

Polling方式采用朱勇監測LGWR釋放寫入redo完成，所以釋放了LGWR占用的CPU資源。

系統負載高（CPU繁忙）采用Polling方式更好。

系統負載低（CPU清閑）采用post/wait方式更好，它能夠提供比polling方式更好的響應時間。

ORACLE根據內部統計信息決定采用何種方式。post/wait和polling方式互相切換能引起過熱，為了確保安全，切換不要太頻繁。

LGWR的trace文件記錄了switch記錄，關鍵字是 "Log file sync switching to ...":

Switch to polling:

   
   
   
           
    
    
    
            *** 2015-01-21 08:19:04.077
    
    
    
            kcrfw_update_adaptive_sync_mode: post->poll long#=2 sync#=5 sync=62 poll=1056 rw=454 ack=0 min_sleep=1056
    
    
    
             
    
    
    
            *** 2015-01-21 08:19:04.077
    
    
    
            Log file sync switching to polling
    
    
    
            Current scheduling delay is 1 usec
    
    
    
            Current approximate redo synch write rate is 1 per sec
    
    
    
            kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=1 switch_sync_count_delta=5

Switch to post/wait:

   
   
   
           
    
    
    
            *** 2015-01-21 08:46:09.428
    
    
    
            Log file sync switching to post/wait
    
    
    
            Current approximate redo synch write rate is 0 per sec
    
    
    
             
    
    
    
            *** 2015-01-21 08:47:46.473
    
    
    
            kcrfw_update_adaptive_sync_mode: post->poll long#=2 sync#=11 sync=228 poll=1442 rw=721 ack=0 min_sleep=1056

相關腳本：

查詢當前log file sync 方式是post-wait還是poll

   
   
   
           
    
    
    
            SQL> select name,value from v$sysstat where name in ('redo sync poll writes','redo synch polls');
    
    
    
             
    
    
    
            NAME                                                                  VALUE
    
    
    
            ---------------------------------------------------------------- ----------
    
    
    
            redo synch polls                                                  325355850

每小時采用poll log file sync方式的次數

    
    
    
            
     
     
     
             col begin_interval_time format a25
     
     
     
             col instance_number format 99 heading INST
     
     
     
             col stat_name format a25
     
     
     
             select snap.BEGIN_INTERVAL_TIME,hist.instance_number , hist.stat_name,hist.redo_synch_polls
     
     
     
             from ( select snap_id,instance_number,stat_name,value -lag(value,1,null) over ( order by snap_id,instance_number,stat_name) redo_synch_polls
     
     
     
                     from dba_hist_sysstat
     
     
     
                     where stat_name='redo synch polls'
     
     
     
                     and dbid=(select dbid from v$database)
     
     
     
                     and instance_number = nvl('&instance_number',1)) hist,
     
     
     
                     dba_hist_snapshot snap
     
     
     
             where redo_synch_polls >0
     
     
     
             and hist.snap_id=snap.snap_id
     
     
     
             and hist.instance_number=snap.instance_number
     
     
     
             order by 1,2
     
     
     
             /
     
     
     
              
     
     
     
             BEGIN_INTERVAL_TIME       INST STAT_NAME                 REDO_SYNCH_POLLS
     
     
     
             ------------------------- ---- ------------------------- ----------------
     
     
     
             06-JAN-15 07.00.02.884 AM    2 redo synch polls                       734
     
     
     
             06-JAN-15 08.00.08.425 AM    2 redo synch polls                     23767
     
     
     
             06-JAN-15 09.00.13.770 AM    2 redo synch polls                     39827
     
     
     
             06-JAN-15 10.00.19.233 AM    2 redo synch polls                     48479
     
     
     
             06-JAN-15 11.00.24.431 AM    2 redo synch polls                     41541
     
     
     
             06-JAN-15 12.00.29.670 PM    2 redo synch polls                     47566
     
     
     
             06-JAN-15 01.00.35.029 PM    2 redo synch polls                     32169
     
     
     
             06-JAN-15 02.00.04.159 PM    2 redo synch polls                     37405
     
     
     
             06-JAN-15 02.59.04.536 PM    2 redo synch polls                     41469
     
     
     
             06-JAN-15 04.00.08.556 PM    2 redo synch polls                     38683
     
     
     
             06-JAN-15 05.00.12.523 PM    2 redo synch polls                     51618
     
     
     
             06-JAN-15 06.00.16.584 PM    2 redo synch polls                     52511
     
     
     
             06-JAN-15 07.00.03.352 PM    2 redo synch polls                     42229
     
     
     
             06-JAN-15 08.00.08.663 PM    2 redo synch polls                     35229
     
     
     
             06-JAN-15 09.00.13.882 PM    2 redo synch polls                     18499

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 eclipse啟動報錯 see the log file的解決辦法【實例解決辦法】Android Studio:design editor is unavailabel until after a successful project sync. / Gradle sync failed: Failed to open zip file. 解決辦法 Eclipse打不開，提示： An error has occurred. see the log file解決辦法 Glufster掛載失敗Mount failed. Please check the log file for more details解決辦法 centos 使用systemctl reload報錯一般解決辦法 no input file specified 解決辦法 IOError: unknown file format 的解決辦法 Android Studio遇到Failed to sync Gradle project錯誤時的解決辦法 Gradle sync failed: Could not find method android() for arguments 錯誤的解決辦法 ValueError: I/O operation on closed file 解決辦法