oracle慎用基於on commit刷新物化視圖的方式(一)


 

oracle慎用基於on commit刷新物化視圖的方式(一)

 

背景

上周五,也就是2020-08-07那天快下班的時候,突然發生一件事。客戶反饋xxxx平台登錄不上去,當時一開始沒意識到是數據庫導致的問題。

業務人員在那懷疑這里那里,還是網絡ip哪里地區又被封了。后來都排除沒發生什么問題,還想着重啟Nginx服務器(我也不懂)。

但是維護Nginx的人員剛好請假了。

剛好當時大概是這個事情發生的半個小時之前,監控郵件報出如下錯誤,

 

 

當時以為只是歸檔目錄快滿了,簡單的備份了下歸檔,清理掉多余的歸檔。

 

排查

問題出現后,雖然沒懷疑到數據庫的問題,不過我也還是看了一眼數據庫。

發現警告日志閃回恢復區有在報使用率超標的問題,有問題,絕對有問題。

除了歸檔量暴增的問題和剛搭建起來沒多久的備庫歸檔也爆掉的問題,沒發現其他的報錯。

 

先看看日志切換的頻率,和平時同一時段相比,這歸檔多了100多倍。

讓我一度懷疑是應用出現了什么問題導致dml操作劇增導致的。

SELECT    TO_CHAR(FIRST_TIME,'YYYY-MM-DD') DAY,
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'00',1,0)),'999') "00",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'01',1,0)),'999') "01",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'02',1,0)),'999') "02",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'03',1,0)),'999') "03",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'04',1,0)),'999') "04",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'05',1,0)),'999') "05",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'06',1,0)),'999') "06",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'07',1,0)),'999') "07",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'08',1,0)),'999') "08",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'09',1,0)),'999') "09",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'10',1,0)),'999') "10",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'11',1,0)),'999') "11",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'12',1,0)),'999') "12",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'13',1,0)),'999') "13",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'14',1,0)),'999') "14",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'15',1,0)),'999') "15",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'16',1,0)),'999') "16",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'17',1,0)),'999') "17",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'18',1,0)),'999') "18",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'19',1,0)),'999') "19",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'20',1,0)),'999') "20",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'21',1,0)),'999') "21",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'22',1,0)),'999') "22",
                TO_CHAR(SUM(DECODE(TO_CHAR(FIRST_TIME,'HH24'),'23',1,0)),'999') "23"
FROM V$LOG_HISTORY
GROUP BY TO_CHAR(FIRST_TIME,'YYYY-MM-DD') 
ORDER BY 1 DESC;
View Code

 

 這個時候我需要定位找出是哪條SQL或者哪些SQL引起的問題。

通過如下語句,可以找出哪些對象有大量數據塊變化情況,

SELECT TO_CHAR(BEGIN_INTERVAL_TIME, 'YYYY-MM-DD HH24') SNAP_TIME, 
       DHSO.OBJECT_NAME, 
       SUM(DB_BLOCK_CHANGES_DELTA)                     BLOCK_CHANGED 
FROM   DBA_HIST_SEG_STAT DHSS, 
       DBA_HIST_SEG_STAT_OBJ DHSO, 
       DBA_HIST_SNAPSHOT DHS 
WHERE  DHS.SNAP_ID = DHSS.SNAP_ID 
       AND DHS.INSTANCE_NUMBER = DHSS.INSTANCE_NUMBER 
       AND DHSS.OBJ# = DHSO.OBJ# 
       AND DHSS.DATAOBJ# = DHSO.DATAOBJ# 
       AND BEGIN_INTERVAL_TIME BETWEEN TO_DATE('2020-08-07 16:00', 
                                       'YYYY-MM-DD HH24:MI') 
                                       AND 
           TO_DATE('2020-08-07 17:30', 'YYYY-MM-DD HH24:MI') 
GROUP  BY TO_CHAR(BEGIN_INTERVAL_TIME, 'YYYY-MM-DD HH24'), 
          DHSO.OBJECT_NAME 
HAVING SUM(DB_BLOCK_CHANGES_DELTA) > 0 
ORDER  BY SUM(DB_BLOCK_CHANGES_DELTA) asc;
View Code

 部分關鍵結果如下,

SNAP_TIME                              |OBJECT_NAME                   |BLOCK_CHANGED
---------------------------------------|------------------------------|-------------
.............                          |..............                |       ......
2020-08-07 16                          |..............                |       100480
2020-08-07 16                          |..............                |       129184
2020-08-07 16                          |..............                |       137024
2020-08-07 17                          |..............                |       143520
2020-08-07 17                          |..............                |       145440
2020-08-07 16                          |..............                |       228624
2020-08-07 16                          |..............                |       256704
2020-08-07 17                          |..............                |       548336
2020-08-07 17                          |..............                |     55019536
2020-08-07 17                          |..............                |    103429408
2020-08-07 16                          |T_XXXXXXXXXX_INFO             |    103997488
2020-08-07 16                          |PK_X_XXXXXX_XXXX              |    195532752

210 rows selected.

Elapsed: 00:00:02.29

 

在2020-08-07 16點內,有兩個對象的塊變化是最多的,分別是

T_XXXXXXXXXX_INFO
PK_X_XXXXXX_XXXX

PK_X_XXXXXX_XXXX很容易就可以查出,這是一個主鍵索引,對應的表名為X_XXXXXX_XXXX。

趕緊取問題時間段的awr報告,看看是否存在相關的X_XXXXXX_XXXX以及T_XXXXXXXXXX_INFO的sql,順便也看看等待事件。

結果一眼就看出等待事件存在問題,

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
enq: TX - row lock contention 683 223.1K 326667 39.0 Application
gc buffer busy acquire 10,048 163.8K 16302 28.6 Cluster
enq: JI - contention 23,034 112.7K 4892 19.7 Other
buffer busy waits 1,770 20.9K 11784 3.6 Concurrency
DB CPU   20K   3.5  
enq: TX - index contention 591 6630.2 11219 1.2 Concurrency
log file switch (checkpoint incomplete) 481 6268.1 13031 1.1 Configuration
db file scattered read 482,552 4536 9 .8 User I/O
gc current block busy 10,845 3853.8 355 .7 Cluster
direct path read 832,639 3170 4 .6 User I/O

 

 

 

 

 

 

 

 

 

 

 

 

 

 

這個系統平時沒有存在那么高的TX鎖等待,趕緊看看哪個段的行鎖等待比較嚴重,

Segments by Row Lock Waits

Owner Tablespace Name Object Name Subobject Name Obj. Type Row Lock Waits % of Capture
ABCD_EFG ABCD X_XXXXXX_XXXX   TABLE 681 53.71
ABCD_XXXXXXXXS DYCK_FILETRANS PK_XXXX_XXXXXXXX_TMP   INDEX 321 25.32

 

 

 

 

剛好和前邊塊變化最多的對象是同一個,也可以看看塊變化,

Segments by DB Blocks Changes

Owner Tablespace Name Object Name Subobject Name Obj. Type DB Block Changes % of Capture
XXXXX_DATA USERS PK_X_XXXXXX_XXXX   INDEX 161,129,664 64.72
XXXXX_DATA USERS T_XXXXXXXXXX_INFO   TABLE 85,726,272 34.43

 

 

 

 

都對的上!!!

這個時候在從awr中找找含有X_XXXXXX_XXXX表的SQL,

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
226,451.96 1,075 210.65 39.58 0.01 0.00 95nn49n57dqza JDBC Thin Client update X_XXXXXX_XXXX set ID=:1...
166,564.13 1,368 121.76 29.12 0.14 0.00 08x3dymqbdbd4 JDBC Thin Client ......................
20,730.38 294 70.51 3.62 0.01 0.03 4vs91dcv7u1p6 JDBC Thin Client ......................
5,395.58 0   0.94 5.25 94.88 c7sa42jj690jw   ......................
5,395.54 7 770.79 0.94 5.25 94.88 1hsz4n5g5dz1j   ......................
5,205.90 38 137.00 0.91 40.58 58.45 66u44fa687q74 xxxxxxxxx.exe ......................
4,619.25 17,384 0.27 0.81 0.05 0.08 atwcm5hwwbtj3 xxxxxxxxxxxxxxxxxx.exe ......................
4,231.15 1 4,231.15 0.74 99.72 0.09 2d0r4v0y38vm1 xxxxxxxxxxx.exe ......................
4,049.48 47,215 0.09 0.71 0.14 0.20 cqfs0h217uuc3 xxxxxxxxxxxxxxxxxx.exe ......................
2,707.50 1 2,707.50 0.47 99.87 0.10 gjj8zbpgta9rv xxxx.exe ......................

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

查看95nn49n57dqza的歷史執行情況,

set lines 150 pages 150                                                                                        
col BEGIN_INTERVAL_TIME for a23                                                                                
col PLAN_HASH_VALUE for 9999999999                                                                             
col date_time for a30                                                                                          
col snap_id heading 'SnapId'                                                                                   
col executions_delta heading "No. of exec"                                                                     
col sql_profile heading "SQL|Profile" for a7                                                                   
col date_time heading 'Date time'                                                                              
col avg_lio heading 'LIO/exec' for 99999999999.99                                                              
col avg_cputime heading 'CPUTIM/exec' for 9999999.99                                                           
col avg_etime heading 'ETIME/exec' for 9999999.99                                                              
col avg_pio heading 'PIO/exec' for 9999999.99                                                                  
col avg_row heading 'ROWs/exec' for 9999999.99                                                                 
SELECT distinct                                                                                                
s.snap_id ,                                                                                                    
PLAN_HASH_VALUE,                                                                                               
to_char(s.BEGIN_INTERVAL_TIME,'yyyy-mm-dd hh24:mi:ss')|| to_char(s.END_INTERVAL_TIME,'-hh24:mi:ss') Date_Time, 
SQL.executions_delta,                                                                                          
SQL.buffer_gets_delta/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_lio,                    
--SQL.ccwait_delta,                                                                                            
(SQL.cpu_time_delta/1000000)/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_cputime ,        
(SQL.elapsed_time_delta/1000000)/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_etime,       
SQL.DISK_READS_DELTA/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_pio,                     
SQL.rows_processed_total/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_row                  
--,SQL.sql_profile                                                                                             
FROM                                                                                                           
dba_hist_sqlstat SQL,                                                                                          
dba_hist_snapshot s                                                                                            
WHERE                                                                                                          
SQL.instance_number =(select instance_number from v$instance)                                                  
and SQL.dbid =(select dbid from v$database)                                                                    
and s.snap_id = SQL.snap_id                                                                                    
AND sql_id in                                                                                                  
('&SQLID') order by s.snap_id                                                                                  
/  
View Code

 結果如下,

    SnapId|PLAN_HASH_VALUE|Date time                     |No. of exec|       LIO/exec|CPUTIM/exec| ETIME/exec|   PIO/exec|  ROWs/exec
----------|---------------|------------------------------|-----------|---------------|-----------|-----------|-----------|-----------
     16581|     2297901584|2020-08-07 13:30:02-14:00:09  |        370|           4.04|        .00|        .00|        .00|    5547.76
     16583|     2297901584|2020-08-07 14:30:16-15:00:24  |        563|           4.02|        .00|        .00|        .00|    3648.02
     16584|     2297901584|2020-08-07 15:00:24-15:30:31  |        567|           4.03|        .00|        .00|        .00|    3623.28
     16585|     2297901584|2020-08-07 15:30:31-16:00:39  |        442|           4.05|        .00|        .00|        .00|    4648.97
     16586|     2297901584|2020-08-07 16:00:39-16:30:47  |        429|           4.28|        .00|        .59|        .01|    4790.84
     16587|     2297901584|2020-08-07 16:30:47-17:01:31  |        366|           5.72|        .06|     169.81|        .07|    5616.50
     16587|     2297901584|2020-08-07 16:30:47-17:07:50  |        366|           5.72|        .06|     169.81|        .07|    5616.50
     16588|     2297901584|2020-08-07 17:01:31-17:30:32  |        278|          15.70|        .03|     423.38|        .15|    7395.38
     16588|     2297901584|2020-08-07 17:07:50-17:30:32  |        278|          15.70|        .03|     423.38|        .15|    7395.38
     16589|     2297901584|2020-08-07 17:30:32-18:00:41  |        369|          31.79|        .07|    1180.75|        .00|    5572.59
     16590|     2297901584|2020-08-07 18:00:41-18:30:51  |        535|          44.85|        .06|     909.64|        .00|    3844.53
     16591|     2297901584|2020-08-07 18:30:51-19:00:00  |        201|          55.29|        .01|     192.97|        .00|   10233.94
     16592|     2297901584|2020-08-07 19:00:00-19:30:08  |         26|           4.69|        .00|       1.26|        .00|   79117.23

從8月7日下午1點開始可以查詢到記錄,一直在下午4點半后開始,sql執行時間急劇增加,但是執行計划確是並沒有變化過。

並且從執行次數看,並沒有多太過於劇烈的波動。

 

到這里目前似乎沒有什么好辦法可以自己追查下去,能確定的是出問題的SQL語句就是95nn49n57dqza,由於update引起大量的TX鎖等待,導致xxxx平台登錄不上去或者響應很久才出現頁面。

而TX鎖嚴重的原因就是一次update的時間從不到1s變到幾百s甚至上千s。

這里存在一個問題,便是在看了awr發現tx鎖嚴重,再到確定SQL的過程,由於重心太過明確導致忽略掉了一開始找出哪些對象有大量數據塊變化情況那里,除了PK_X_XXXXXX_XXXX外其實還有另外一個對象T_XXXXXXXXXX_INFO。

 

暫且忽略T_XXXXXXXXXX_INFO順着一直下去的思路的話,單純從視圖間的關聯貌似已經無法繼續下去,可以選擇在數據庫級別做一次ssd,或者10046。

推薦10046,從結果反推回去,10046是可以看出會話除了有執行95nn49n57dqza之外,也是有對T_XXXXXXXXXX_INFO表做insert的操作的。

因為第一時間發現95nn49n57dqza的問題,立馬問了業務人員之后,他們有反饋在4點左右有創建一張物化視圖T_XXXXXXXXXX_INFO,基表就是X_XXXXXX_XXXX。因此沒做10046。

看了創建語句,用了on commit。大致找了下資料,基本確定是這個引起的。當即決定刪除這個物化視圖。

18:54:45 SYS@dyckdb1(2194)> drop MATERIALIZED VIEW MACRO_DATA.T_ENTERPRISE_INFO;
drop MATERIALIZED VIEW MACRO_DATA.T_ENTERPRISE_INFO
*
ERROR at line 1:
ORA-04021: timeout occurred while waiting to lock object


Elapsed: 00:00:49.05

--撤銷授權操作,不然物化視圖刪不掉 18:57:45 SYS@dyckdb2(1269)> revoke select on dyck_pub.t_client_info from MACRO_DATA; ^Crevoke select on dyck_pub.t_client_info from MACRO_DATA * ERROR at line 1: ORA-01013: user requested cancel of current operation Elapsed: 00:02:49.57 19:02:04 SYS@dyckdb2(1269)> / Revoke succeeded. Elapsed: 00:00:09.58 19:02:46 SYS@dyckdb1(2194)> drop MATERIALIZED VIEW MACRO_DATA.T_ENTERPRISE_INFO; Materialized view dropped. Elapsed: 00:00:03.82

刪除視圖,問題解決。

這里提供另外一個查找產生指定時間段內產生redo最多的SQL做參考:

SELECT TO_CHAR(BEGIN_INTERVAL_TIME,'YYYY_MM_DD HH24') WHEN,
             DBMS_LOB.SUBSTR(SQL_TEXT,4000,1) SQL,
             DHSS.INSTANCE_NUMBER INST_ID,
             DHSS.SQL_ID,
             EXECUTIONS_DELTA EXEC_DELTA,
             ROWS_PROCESSED_DELTA ROWS_PROC_DELTA
FROM DBA_HIST_SQLSTAT DHSS,
         DBA_HIST_SNAPSHOT DHS,
         DBA_HIST_SQLTEXT DHST
WHERE UPPER(DHST.SQL_TEXT) LIKE '%T_CLIENT_INFO%'
  AND LTRIM(UPPER(DHST.SQL_TEXT)) NOT LIKE 'SELECT%'
  AND DHSS.SNAP_ID=DHS.SNAP_ID
  AND DHSS.INSTANCE_NUMBER=DHS.INSTANCE_NUMBER
  AND DHSS.SQL_ID=DHST.SQL_ID
  AND BEGIN_INTERVAL_TIME BETWEEN TO_DATE('2020-08-07 16:00','YYYY-MM-DD HH24:MI')
  AND TO_DATE('2020-08-07 17:00','YYYY-MM-DD HH24:MI')
  order by exec_delta;
View Code

通過上邊的語句查找是可以發現update基表和insert物化視圖的SQL語句的。

接下來可以通過模擬on commit對update影響有多大,順便看看是否由於物化視圖關系是否導致redo量激增。

oracle慎用基於on commit刷新物化視圖的方式(二)。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM