有時候,你會在ORACLE數據庫的告警日志中發現“Thread <number> cannot allocate new log, sequence <number> Checkpoint not complete”這類告警。具體案例如下所示:
Thread 1 cannot allocate new log, sequence 279334
Checkpoint not complete
Current log# 4 seq# 279333 mem# 0: /u01/oradata/GSP/redo04.log
Current log# 4 seq# 279333 mem# 1: /u03/oradata/GSP/redo04.log
當然Thread或sequence的數值可能有所不同,基本上是類似下面這樣的告警信息
Thread <number> cannot allocate new log, sequence <number>
Checkpoint not complete
也有可能是因為在等待重做日志的歸檔,出現的是下面這類告警信息
ORACLE Instance <name> - Can not allocate log, archival required
Thread <number> cannot allocate new log, sequence <number>
那么出現這類告警的具體原因是什么呢? 以及要如何去解決這個問題呢?
原因分析:
通常來說是因為重做日志(redo log)在寫滿后就會切換日志組,這個時候就會觸發一次檢查點事件(checkpoint),檢查點(checkpoint)激活時會觸發數據庫寫進程(DBWR),將數據緩沖區里的臟數據塊寫回到磁盤的數據文件中,只要這個臟數據寫回磁盤事件沒結束,那么數據庫就不會釋放這個日志組。在歸檔模式下,還會伴隨着ARCH進程將重做日志進行歸檔的過程。如果重做日志(redo log)產生的過快,當CPK或歸檔還沒完成,LGWR已經把其余的日志組寫滿,又要往當前的日志組里面寫redo log的時候,這個時候就會發生沖突,數據庫就會被掛起。並且一直會往alert.log中寫類似上面的錯誤信息。
另外,重做日志在不同業務時段的切換頻率不一樣,所以出現這個錯誤,一般是業務繁忙或者出現大量DML操作的時候。
解決方法:
1:增大REDO LOG FILE的大小
增大redo log file的大小容易操作,但是redo log file設置為多大才是合理的呢?
1:參考V$INSTANCE_RECOVERY中OPTIMAL_LOGFILE_SIZE字段值,但是這個字段有可能為Null值,除非你調整FAST_START_MTTR_TARGET參數的值大於0
Redo log file size (in megabytes) that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. It is recommended that the user configure all online redo logs to be at least this value.
官方文檔的建議如下:
You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to determine the size of your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online logs to be at least this size.
Note, however, that the redo log file size affects the MTTR. In some cases, you may be able to refine your choice of the optimal FAST_START_MTTR_TARGET value by re-running the MTTR Advisor with your suggested optimal log file size.
SQL> SELECT OPTIMAL_LOGFILE_SIZE FROM V$INSTANCE_RECOVERY;
2:根據重做日志切換次數和重做日志生成的量來判斷
可以用awr_redo_size_history腳本統計分析一下,每個小時、每天生成的歸檔日志的大小,然后可以某些時間段(切換頻繁的時間段)的歸檔日志大小和15~ 20分鍾(如果某個時間段切換非常頻繁,幾乎無法使用這個規則,因為重組日志會非常大)切換一次計算重做日志大小。當然這個不是放之四海而皆准的規則,需要根據實際業務判斷,大部分情況下還是可以參考這個
計算重做日志的一個腳本,僅供參考
SELECT
(SELECT ROUND(AVG(BYTES) / 1024 / 1024, 2) FROM V$LOG) AS "Redo size (MB)",
ROUND((20 / AVERAGE_PERIOD) * (SELECT AVG(BYTES)
FROM V$LOG) / 1024 / 1024, 2) AS "Recommended Size (MB)"
FROM (SELECT AVG((NEXT_TIME - FIRST_TIME) * 24 * 60) AS AVERAGE_PERIOD
FROM V$ARCHIVED_LOG
WHERE FIRST_TIME > SYSDATE - 3
AND TO_CHAR(FIRST_TIME, 'HH24:MI') BETWEEN
&START_OF_PEAK_HOURS AND &END_OF_PEAK_HOURS
);
2:增加REDO LOG Group的數量
增加日志組的數量,其實並不能解決“Thread <number> cannot allocate new log, sequence <number> Checkpoint not complete” 這個問題,但是他能解決下面這個問題:
ORACLE Instance <name> - Can not allocate log, archival required
Thread <number> cannot allocate new log, sequence <number>
這個是因為ARCH進程,尚未完成將重做日志文件復制到歸檔目標(需要存檔),而此時由於重做日志切換太快或日志組過少,必須等待ARCR進程完成歸檔后,才能循環覆蓋日志組。
3:Tune checkpoint
這個比較難,參考官方文檔:Note 147468.1 Checkpoint Tuning and Troubleshooting Guide
4:Increase I/O speed for writing online REDO log/Archived REDO
This applies to Thread <number> cannot allocate new log, sequence <number>
Checkpoint not complete
- use ASYNC I/O if not already so
- use DBWR I/O slaves or multiple DBWR processes
Reference:
Oracle Database Performance Tuning Guide
Instance Tuning Using Performance Views
Consider Multiple Database Writer (DBWR) Processes or I/O Slaves
10.2 - http://docs.oracle.com/cd/B19306_01/server.102/b14211/instance_tune.htm#i42802
11.1 - http://docs.oracle.com/cd/B28359_01/server.111/b28274/instance_tune.htm#i42802
11.2 - http://docs.oracle.com/cd/E11882_01/server.112/e16638/instance_tune.htm#PFGRF94511
- consider the generic recommendations for REDO log files:
If the high I/O files are redo log files, then consider splitting the redo log files from the other files. Possible configurations can include the following:
1. Placing all redo logs on one disk without any other files. Also consider availability; members of the same group should be on different physical disks and controllers for recoverability purposes.
2. Placing each redo log group on a separate disk that does not store any other files.
3. Striping the redo log files across several disks, using an operating system striping tool. (Manual striping is not possible in this situation.)
4. Avoiding the use of RAID 5 for redo logs.
Reference:
Oracle Database Performance Tuning Guide
Redo Log Files
10.2 - http://docs.oracle.com/cd/B19306_01/server.102/b14211/iodesign.htm#sthref534
11.1 - http://docs.oracle.com/cd/B28359_01/server.111/b28274/iodesign.htm#CHDBCDHG
11.2 - http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#PFGRF94396
For
ORACLE Instance <name> - Can not allocate log, archival required
Thread <number> cannot allocate new log, sequence <number>
In the above document you may check section "Archived Redo Logs"
5: 找到產生大量重做日志的SQL,如果這個SQL有業務或邏輯上不合理的地方,就要修改,或者將相關表設置為NOLOGGING,減少重做日志的產生
關於如何定位那些SQL產生了大量的重做日志,可以使用LogMiner工具,也可以參考我這篇博客“如何定位那些SQL產生了大量的redo日志”
參考資料:
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:69012348056
Manual Log Switching Causing "Thread 1 Cannot Allocate New Log" Message in the Alert Log (文檔 ID 435887.1)
Can Not Allocate Log (文檔 ID 1265962.1)
https://gokhanatil.com/2009/08/optimum-size-of-the-online-redo-log-files.html