就在前幾天,又有一個客戶向我咨詢undo表空間使用率的問題。
這讓我想起幾年前曾經有個省份的案例,客戶的實際運維人員是一位剛畢業不久的女孩,幾乎不懂Oracle原理,項目經理交給她的任務也是基礎運維工作,比如其中一項就是監測數據庫各個表空間的使用率,並對使用率超過95%的表空間進行擴展,他們的Oracle版本是10gR2。
由於該客戶業務是運營商話單相關的,業務數據量很大(幾十T的規模),所以預留存儲的空間也很充足。
有一次該客戶有其他問題找到我遠程處理的時候,我驚奇的發現他們的undo表空間居然有2個多T大小。進而詢問運維人員是怎么回事,想必結果大家已經猜到了,這女孩說她日常巡檢經常發現undo表空間使用率超過95%,所以她就不停地擴展,直到如今已經加到2個多T規模的大小。她甚至認為undo表空間也是某一個業務的表空間,這就尷尬了。
那么,究竟什么是undo?undo都有哪些實際作用呢?Oracle 10g的官方文檔是這樣描述的:
What Is Undo?
Every Oracle Database must have a method of maintaining information that is used to roll back, or undo, changes to the database. Such information consists of records of the actions of transactions, primarily before they are committed. These records are collectively referred to as undo.
Undo records are used to:
Roll back transactions when a ROLLBACK statement is issued
Recover the database
Provide read consistency
Analyze data as of an earlier point in time by using Oracle Flashback Query
Recover from logical corruptions using Oracle Flashback features
When a ROLLBACK statement is issued, undo records are used to undo changes that were made to the database by the uncommitted transaction. During database recovery, undo records are used to undo any uncommitted changes applied from the redo log to the datafiles. Undo records provide read consistency by maintaining the before image of the data for users who are accessing the data at the same time that another user is changing it.
具體來看下我10.2.0.5實驗環境下undo相關參數的默認設置:
SQL> show parameter undo
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1
可以看到undo_management默認設置為AUTO,關於這個值,官檔這樣描述:
Automatic undo management uses an undo tablespace.To enable automatic undo management, set the UNDO_MANAGEMENT initialization parameter to AUTO in your initialization parameter file. In this mode, undo data is stored in an undo tablespace and is managed by Oracle Database.
而對於undo_retention這個值,默認是900,單位是s,也就是15分鍾。很多實際的環境,也會考慮將其設置的大一些,比如10800,即3小時。
來看下官檔對於undo retention和與之相關的retention guarantee的具體描述:
Undo Retention
After a transaction is committed, undo data is no longer needed for rollback or transaction recovery purposes. However, for consistent read purposes, long-running queries may require this old undo information for producing older images of data blocks. Furthermore, the success of several Oracle Flashback features can also depend upon the availability of older undo information. For these reasons, it is desirable to retain the old undo information for as long as possible.
When automatic undo management is enabled, there is always a current undo retention period, which is the minimum amount of time that Oracle Database attempts to retain old undo information before overwriting it. Old (committed) undo information that is older than the current undo retention period is said to be expired. Old undo information with an age that is less than the current undo retention period is said to be unexpired.
Oracle Database automatically tunes the undo retention period based on undo tablespace size and system activity. You can specify a minimum undo retention period (in seconds) by setting the UNDO_RETENTION initialization parameter. The database makes its best effort to honor the specified minimum undo retention period, provided that the undo tablespace has space available for new transactions. When available space for new transactions becomes short, the database begins to overwrite expired undo. If the undo tablespace has no space for new transactions after all expired undo is overwritten, the database may begin overwriting unexpired undo information. If any of this overwritten undo information is required for consistent read in a current long-running query, the query could fail with the snapshot too old error message.
The following points explain the exact impact of the UNDO_RETENTION parameter on undo retention:
The UNDO_RETENTION parameter is ignored for a fixed size undo tablespace. The database may overwrite unexpired undo information when tablespace space becomes low.
For an undo tablespace with the AUTOEXTEND option enabled, the database attempts to honor the minimum retention period specified by UNDO_RETENTION. When space is low, instead of overwriting unexpired undo information, the tablespace auto-extends. If the MAXSIZE clause is specified for an auto-extending undo tablespace, when the maximum size is reached, the database may begin to overwrite unexpired undo information.
Retention Guarantee
To guarantee the success of long-running queries or Oracle Flashback operations, you can enable retention guarantee. If retention guarantee is enabled, the specified minimum undo retention is guaranteed; the database never overwrites unexpired undo data even if it means that transactions fail due to lack of space in the undo tablespace. If retention guarantee is not enabled, the database can overwrite unexpired undo when space is low, thus lowering the undo retention for the system. This option is disabled by default.
WARNING:
Enabling retention guarantee can cause multiple DML operations to fail. Use with caution.You enable retention guarantee by specifying the RETENTION GUARANTEE clause for the undo tablespace when you create it with either the CREATE DATABASE or CREATE UNDO TABLESPACE statement. Or, you can later specify this clause in an ALTER TABLESPACE statement. You disable retention guarantee with the RETENTION NOGUARANTEE clause.
You can use the DBA_TABLESPACES view to determine the retention guarantee setting for the undo tablespace. A column named RETENTION contains a value of GUARANTEE, NOGUARANTEE, or NOT APPLY (used for tablespaces other than the undo tablespace).
看到這里,我們已經可以明白,對於本文開頭我說到的那個案例,為什么undo明明是可以循環利用的,卻不斷增長最終使得那個女孩不斷擴展undo表空間。
之前看到Maclean在群里答復一個網友的相關提問,給出了如下語句來查詢undo真實的使用率:
prompt
prompt ############## IN USE Undo Data ##############
prompt
select
((select (nvl(sum(bytes),0))
from dba_undo_extents
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like '%GUARANTEE' )
and status in ('ACTIVE','UNEXPIRED')) *100) /
(select sum(bytes)
from dba_data_files
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like '%GUARANTEE' )) "PCT_INUSE"
from dual;
可以看到,這個語句實際上就是將狀態為ACTIVE和UNEXPIRED的,計算為已使用。如果retention guarantee並沒有設置的話,那么這個使用率高也不一定會有問題,因為Oracle會將unexpired狀態的也拿來重用。
另外需要注意,如果是RAC,上述的查詢會將兩個實例的結果平均,而實際上我們希望是各自統計各自的。所以可以直接指定我們要查詢的undo表空間名稱:
select
((select (nvl(sum(bytes),0))
from dba_undo_extents
where tablespace_name = '&TABLESPACE_NAME'
and status in ('ACTIVE','UNEXPIRED')) *100) /
(select sum(bytes)
from dba_data_files
where tablespace_name = '&TABLESPACE_NAME') "PCT_INUSE"
from dual;
也可以通過dba_undo_extents監控undo表空間的使用情況,按狀態分組:
select tablespace_name, status, sum(bytes/1024/1024) "MB"
from dba_undo_extents
group by tablespace_name, status
order by 1, 2;
根據上面的知識,我們只需關注結果中狀態為ACTIVE的占用多少,如果設置了retention guarantee,那么還要同時關注UNEXPIRED的占用多少。
此外,從Maclean的博客中找到兩條實用的UNDO表空間監控的查詢SQL:
--在Oracle 10g版本中可以使用V$UNDOSTAT視圖用於監控實例中當前事務使用UNDO表空間的情況。視圖中的每行列出了每隔十分鍾從實例中收集到的統計信息。
--每行都表示了在過去7*24小時里每隔十分鍾UNDO表空間的使用情況,事務量和查詢長度等信息的統計快照。
--UNDO表空間的使用情況會因事務量變化而變化,一般我們在計算時同時參考UNDO表空間的平均使用情況和峰值使用情況
--以下SQL語句用於計算過去7*24小時中UNDO表空間的平均使用量
select ur undo_retention,
dbs db_block_size,
((ur * (ups * dbs)) + (dbs * 24)) / 1024 / 1024 as "M_bytes"
from (select value as ur from v$parameter where name = 'undo_retention'),
(select (sum(undoblks) / sum(((end_time - begin_time) * 86400))) ups
from v$undostat),
(select value as dbs from v$parameter where name = 'db_block_size');
--以下SQL語句則按峰值情況計算UNDO表空間所需空間:
select ur undo_retention,
dbs db_block_size,
((ur * (ups * dbs)) + (dbs * 24)) / 1024 / 1024 as "M_bytes"
from (select value as ur from v$parameter where name = 'undo_retention'),
(select (undoblks / ((end_time - begin_time) * 86400)) ups
from v$undostat
where undoblks in (select max(undoblks) from v$undostat)),
(select value as dbs from v$parameter where name = 'db_block_size');
最后,實際我們透過這個簡單的案例來看,實際很多項目上,也的確真實存在一些運維人員,他們並不具備相應的知識儲備,就直接去做相應工作了,其結果就是讓本不復雜的系統布滿了各種各樣的坑。
所以,無論是學什么做什么,對於基礎知識還是要深入的去學習和思考的,不積跬步無以至千里。