ORA-00471: DBWR process terminated with error案例


 

   每年年底,系統管理員都要組織一次容災方案的測試、演練。會在一個與生產環境網絡隔離的DR環境中,啟動各個生產環境服務器,然后讓各路人員參與其中測試、演練容災方案是否可靠。這次演練中,一台Oracle數據庫服務器啟動的時候遇到了問題。如下所示,啟動的時候遇到ORA-03113: end-of-file on communication channel錯誤。

 

[oracle@mylnx6 ~]$ sqlplus / as sysdba
 
SQL*Plus: Release 10.2.0.5.0 - Production on Fri Dec 21 09:42:11 2018
 
Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.
 
Connected to an idle instance.
 
SQL> startup
ORA-03113: end-of-file on communication channel
SQL> 

 

檢查告警日志,發現數據庫在啟動的時候,報ORA-00471: DBWR process terminated with error錯誤。如下所示:

 

PMON started with pid=2, OS id=25005

PSP0 started with pid=3, OS id=25007

MMAN started with pid=4, OS id=25009

DBW0 started with pid=5, OS id=25011

LGWR started with pid=6, OS id=25013

CKPT started with pid=7, OS id=25016

SMON started with pid=8, OS id=25018

RECO started with pid=9, OS id=25020

CJQ0 started with pid=10, OS id=25022

MMON started with pid=11, OS id=25024

Fri Dec 21 09:44:36 CST 2018

starting up 8 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

MMNL started with pid=12, OS id=25026

Fri Dec 21 09:45:12 CST 2018

starting up 24 shared server(s) ...

Fri Dec 21 09:46:43 CST 2018

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_25005.trc:

ORA-00471: DBWR process terminated with error

Fri Dec 21 09:46:43 CST 2018

PMON: terminating instance due to error 471

Instance terminated by PMON, pid = 25005

 

clip_image001

 

 

啟動數據庫實例的時候,報ORA-00471: DBWR process terminated with error這個錯誤,這個很蹊蹺,很有可能是進程被系統給Kill掉了,檢查操作系統的錯誤日志,發現出現了oom_kill_process,也就是說數據庫實例啟動的時候,由於系統內存資源緊張,DBWR進程被系統選作了犧牲品。具體錯誤日志如下所示:

 

Dec 21 09:46:39 mylnx6 kernel: oracle invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
Dec 21 09:46:39 mylnx6 kernel: oracle cpuset=/ mems_allowed=0
Dec 21 09:46:39 mylnx6 kernel: Pid: 25026, comm: oracle Not tainted 2.6.32-200.13.1.el5uek #1
Dec 21 09:46:39 mylnx6 kernel: Call Trace:
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810a0b66>] ? cpuset_print_task_mems_allowed+0x92/0x9e
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810d9fbc>] ? select_bad_process+0xbc/0x102
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810da03f>] __out_of_memory+0x3d/0x86
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810da30f>] out_of_memory+0xfc/0x195
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810dd75e>] __alloc_pages_nodemask+0x487/0x595
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff811075ac>] alloc_page_vma+0xb9/0xc8
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810ff0a7>] read_swap_cache_async+0x52/0xf1
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810ff1a3>] swapin_readahead+0x5d/0x9c
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810d725a>] ? find_get_page+0x22/0x69
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810f1ea3>] handle_mm_fault+0x44b/0x80f
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff8106d7cd>] ? getrusage+0x2b1/0x2ce
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff8101270e>] ? common_interrupt+0xe/0x13
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff81043696>] ? should_resched+0xe/0x2f
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff81456006>] do_page_fault+0x210/0x299
Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff81453fd5>] page_fault+0x25/0x30
Dec 21 09:46:39 mylnx6 kernel: Mem-Info:
Dec 21 09:46:39 mylnx6 kernel: Node 0 DMA per-cpu:
Dec 21 09:46:39 mylnx6 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    3: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    4: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    5: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    6: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    7: hi:    0, btch:   1 usd:   0
Dec 21 09:46:39 mylnx6 kernel: Node 0 DMA32 per-cpu:
Dec 21 09:46:39 mylnx6 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Dec 21 09:46:39 mylnx6 kernel: Node 0 Normal per-cpu:
Dec 21 09:46:39 mylnx6 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Dec 21 09:46:40 mylnx6 lvm[4702]: Another thread is handling an event. Waiting...
Dec 21 09:46:41 mylnx6 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Dec 21 09:46:40 mylnx6 lvm[4702]: Another thread is handling an event. Waiting...
Dec 21 09:46:41 mylnx6 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Dec 21 09:46:41 mylnx6 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Dec 21 09:46:41 mylnx6 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Dec 21 09:46:41 mylnx6 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Dec 21 09:46:41 mylnx6 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Dec 21 09:46:41 mylnx6 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Dec 21 09:46:41 mylnx6 kernel: active_anon:1764 inactive_anon:209 isolated_anon:64
Dec 21 09:46:41 mylnx6 kernel:  active_file:349 inactive_file:1710 isolated_file:0
Dec 21 09:46:41 mylnx6 kernel:  unevictable:5377 dirty:0 writeback:4 unstable:0
Dec 21 09:46:41 mylnx6 kernel:  free:29838 slab_reclaimable:2400 slab_unreclaimable:119491
Dec 21 09:46:41 mylnx6 kernel:  mapped:2703 shmem:830 pagetables:9849 bounce:0
Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA free:15652kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15172kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 3000 24210 24210
Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA32 free:86296kB min:2464kB low:3080kB high:3696kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 0 21210 21210
Dec 21 09:46:41 mylnx6 kernel: Node 0 Normal free:17404kB min:17440kB low:21800kB high:26160kB active_anon:7056kB inactive_anon:836kB active_file:1396kB inactive_file:6840kB unevictable:21508kB isolated(anon):256kB isolated(file):0kB present:21719040kB mlocked:21504kB dirty:0kB writeback:16kB mapped:10812kB shmem:3320kB slab_reclaimable:9600kB slab_unreclaimable:477964kB kernel_stack:2800kB pagetables:39396kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:544 all_unreclaimable? no
Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 0 0 0
Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA: 1*4kB 2*8kB 1*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15652kB
Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA32: 12*4kB 13*8kB 2*16kB 5*32kB 5*64kB 11*128kB 3*256kB 7*512kB 6*1024kB 4*2048kB 16*4096kB = 86296kB
Dec 21 09:46:41 mylnx6 kernel: Node 0 Normal: 420*4kB 1917*8kB 49*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17800kB
Dec 21 09:46:41 mylnx6 kernel: 4722 total pagecache pages
Dec 21 09:46:41 mylnx6 kernel: 694 pages in swap cache
Dec 21 09:46:41 mylnx6 kernel: Swap cache stats: add 589182, delete 588488, find 343370/443306
Dec 21 09:46:41 mylnx6 kernel: Free swap  = 66723056kB
Dec 21 09:46:41 mylnx6 kernel: Total swap = 67108856kB
Dec 21 09:46:41 mylnx6 kernel: 6291440 pages RAM
Dec 21 09:46:41 mylnx6 kernel: 107316 pages reserved
Dec 21 09:46:41 mylnx6 kernel: 24060 pages shared
Dec 21 09:46:41 mylnx6 kernel: 77648 pages non-shared
Dec 21 09:46:41 mylnx6 kernel: Out of memory: kill process 25011 (oracle) score 8425150 or a child
Dec 21 09:46:41 mylnx6 kernel: Killed process 25011 (oracle)
Dec 21 09:47:20 mylnx6 lvm[4702]: Another thread is handling an event. Waiting...

 

檢查這個系統的內存,發現DR環境下,這個服務器只分配了24G內存,而實際生產環境的內存為64G(設置了Linux標准大頁,而且SGA_MAX_SIZE大小為32G),而且這個環境是生產環境的“克隆體”,只是由於資源限制的緣故,系統管理員只分配24G內存。如下所示:

 

[root@mylnx6 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         24156      24033        123          0          0          6
-/+ buffers/cache:      24026        130
Swap:        65535         41      65494
[root@mylnx6 ~]# ps -ef | grep ora_
root     11759 11490  0 16:10 pts/1    00:00:00 grep ora_
[root@mylnx6 ~]# ipcs -m
 
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 3080192    root      644        80         2                       
0x00000000 3112961    root      644        16384      2                       
0x00000000 3145730    root      644        280        2                       
0x00000000 4096003    gdm       600        393216     0                       
0x2cd12178 3866628    oracle    640        34361835520 0                       
0x00000000 5210117    gdm       600        393216     2          dest   

 

如上所示,可以看到oracle用戶的共享內存段為34361835520字節。所以引起這個錯誤的原因是因為在系統層面配置了標准大頁的緣故(內存資源變化了,但是配置沒有隨之修改),為了快速解決問題,我們取消標准大頁的相關設置。如下所示:

 

修改limits.conf參數,注釋soft memlock和hard memlock參數。

 

vi /etc/security/limits.conf

 

clip_image002

 

然后修改sysctl.conf,將vm.nr_hugepages注釋掉。然后重啟一下(DR測試環境,可以隨時重啟)。然后啟動Oracle數據庫實例,一切正常,當然還需調整相關參數,繼續后續測試~。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM