根据文章能找到临时解决方案 《XFS:kmem_alloc中可能的内存分配死锁》
相关命令:
echo 1 > /proc/sys/vm/drop_caches
顺便了解到可以部署 xfs_gurad 做后台守护进程解决,过程如下:
找到了基于ubuntu 的 ansible_role文件 ,链接:https://github.com/Rheinwerk/ansible-role-xfs_guard
由于不适合 Centos 决定做微调,流程如下:
1. 创建需要的文件
/usr/local/sbin/xfs-guard
#!/bin/bash # vim: ft=sh ts=4 noet function emit_bosun { if [[ -x /usr/local/bin/emit_bosun ]]; then /usr/local/bin/emit_bosun ${*} fi } if [[ -z "${SEARCHTERM}" || -z "${MONITORED_LOG}" || -z "${STATEFILE}" || -z "${THRESHOLD_SECONDS}" ]]; then echo "${0}: Missing one or more required environment variables." >&2 exit 1; fi tail -n 1 -F "${MONITORED_LOG}" | grep --line-buffered "${SEARCHTERM}" | awk '{ print systime(); fflush(); }' | while read -r found_at_time; do emit_bosun -m xfs-guard.events.xfs-allocation-deadlock.found -v 1 --tags service=centerdevice -r gauge -u None -d "Found a kernel log line regarding XFS memory allocation drop." if [[ -r "${STATEFILE}" ]]; then last_dropped_caches_at=$(cat "${STATEFILE}") else last_dropped_caches_at=0 fi difference=$(( ${found_at_time} - ${last_dropped_caches_at})) if [[ ${difference} -ge ${THRESHOLD_SECONDS} ]]; then echo 2 > /proc/sys/vm/drop_caches date +%s > "${STATEFILE}" logger -t xfs-guard "Dropped slab cache" emit_bosun -m xfs-guard.events.drop.slab -v 1 -r gauge -u None -d "1 = dropped slab cache -- echo 2 > /proc/sys/vm/drop_caches" else logger -t xfs-guard "Withstood the urge to drop slab cache, because last time was less than ${THRESHOLD_SECONDS}s ago." emit_bosun -m xfs-guard.events.skipped.slab -v 1 -r gauge -u None -d "1 = didn't drop slab cache, because requested too often in quick succession" fi done
/etc/sysconfig/xfs-guard
SEARCHTERM='kernel: XFS.*possible memory allocation deadlock size.*in kmem_realloc' MONITORED_LOG="/var/log/messages" STATEFILE="/run/xfs-guard.txt" THRESHOLD_SECONDS=10 ENABLED=1
创建系统服务
/lib/systemd/system/xfs-guard.service
[Unit] Description=Monitor and mitigate XFS memory allocation freezes DefaultDependencies=no [Service] Type=simple ExecStart=/usr/local/sbin/xfs-guard EnvironmentFile=-/etc/sysconfig/xfs-guard
刷新服务并增加开机启动
systemctl daemon-reload
systemctl enable xfs-guard.service