現象:莫名的重起
查看日志:
Aug 2 20:26:25 localhost kernel: EDAC MC1: 26 CE memory read error on CPU_SrcID#1_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0x3bc505 offset:0x9c0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0092 socket:1 ha:0 channel_mask:4 rank:0)
Aug 2 20:26:25 localhost kernel: EDAC MC1: 29 CE memory read error on CPU_SrcID#1_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0x3ba088 offset:0x5c0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0092 socket:1 ha:0 channel_mask:4 rank:0)
edac-utils安裝命令
yum install -y libsysfs edac-utils
檢測結果,有55個錯誤
[root@localhost ~]#edac-util -v mc0: 0 Uncorrected Errors with no DIMM info mc0: 0 Corrected Errors with no DIMM info mc0: csrow0: 0 Uncorrected Errors mc0: csrow0: CPU_SrcID#0_Ha#0_Chan#1_DIMM#0: 0 Corrected Errors mc0: csrow0: CPU_SrcID#0_Ha#0_Chan#2_DIMM#0: 0 Corrected Errors mc0: csrow0: CPU_SrcID#0_Ha#0_Chan#3_DIMM#0: 0 Corrected Errors mc1: 0 Uncorrected Errors with no DIMM info mc1: 0 Corrected Errors with no DIMM info mc1: csrow0: 0 Uncorrected Errors mc1: csrow0: CPU_SrcID#1_Ha#0_Chan#0_DIMM#0: 0 Corrected Errors mc1: csrow0: CPU_SrcID#1_Ha#0_Chan#1_DIMM#0: 0 Corrected Errors mc1: csrow0: CPU_SrcID#1_Ha#0_Chan#2_DIMM#0: 55 Corrected Errors mc1: csrow0: CPU_SrcID#1_Ha#0_Chan#3_DIMM#0: 0 Corrected Errors [root@localhost ~]#
服務器面板報錯:
拔掉服務器上的1巢上面的內存,再次開機啟動,問題解決
再用軟件進行測試,工作正常