rhel7.2上安裝12C RAC數據庫后,其中一個數據庫實例經常會自動crash。查看alert日志發現以下錯誤信息:
Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_j000_21047.trc: ORA-27157: OS post/wait facility removed ORA-27300: OS system dependent operation:semop failed with status: 43 ORA-27301: OS failure message: Identifier removed ORA-27302: failure occurred at: sskgpwwait1 Fri Sep 09 16:50:53 2016 Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_rmv0_20798.trc: ORA-27157: OS post/wait facility removed Fri Sep 09 16:50:53 2016 Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_q005_21328.trc: ORA-27157: OS post/wait facility removed ORA-27300: OS system dependent operation:semop failed with status: 43 ORA-27301: OS failure message: Identifier removed ORA-27302: failure occurred at: sskgpwwait1
錯誤原因描述:
在rhel7.2中,systemd-logind服務引入了一個新特性:在一個user完全退出OS后會remove掉所有的IPC對象。
該特性由/etc/systemd/logind.conf參數文件中RemoveIPC選項來控制。詳細請看man logind.conf(5)。
在rhel7.2中,RemoveIPC的默認值是yes
因此,當最后一個oracle或者grid用戶退出時,操作系統會remove掉這個user的shared memory segments和semaphores
而Oracle ASM和database的SGA需要使用 shared memory segments,因此remove shared memory segments將會crash掉Oracle ASM和database instances。
請參考Redhat bug 1264533 - https://bugzilla.redhat.com/show_bug.cgi?id=1264533
這個問題會影響使用shared memory segments和semaphores的所有應用,因此,Oracle ASM 實例和Oracle Database 實例均受到影響。
oel7.2為了避免這個問題,在/etc/systemd/logind.conf配置文件中明確設置RemoveIPC為no。
該問題會導致的現象:
1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation. 2) Upgrading to 11.2 and 12c GI/CRS fails. 3) After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash.
systemd-logind可能會在任何時候remove IPC對象,發生錯誤的時候對應的日志現象也不同。比如:
Most common error that occurs is that the following is found in the asm or database alert.log: ORA-27157: OS post/wait facility removed ORA-27300: OS system dependent operation:semop failed with status: 43 ORA-27301: OS failure message: Identifier removed ORA-27302: failure occurred at: sskgpwwait1
The second observed error occurs during installation and upgrade when asmca fails with the following error: KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin KFOD-00105: Could not open pfile 'init@.ora'
The third observed error occurred during installation and upgrade: Creation of ASM password file failed. Following error occurred: Error in Process: /d12/app/12.1.0/grid/bin/orapwd Enter password for SYS: OPW-00009: Could not establish connection to Automatic Storage Management instance 2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed 2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM
The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed: Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]
修改方法:
1).設置/etc/systemd/logind.conf中RemoveIPC=no
2).重啟服務器或者重啟systemd-logind
重啟systemd-logind:
# systemctl daemon-reload # systemctl restart systemd-logind
MOS Doc:
ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)