前言
zabbix-server昨天出了個問題,不停的重啟。昨天擺弄到晚上也不搞清楚原因,按照網上說的各種操作,各種CacheSize、TimeOut、StartPollers都改了,還有什么Include的日志也不貼說個丟,,,想着今天一早來處理下,結果出了生產事故。
剛好最近超融合不穩定,凌晨的時候,生產環境有台服務器(注冊中心和配置中心)無故重啟,然后導致一系列的問題,這個不在這里贅述,來講一下zabbix這個事吧。
環境
CentOS Linux release 7.6.1810 (Core)
mysql 5.7 # docker啟動,數據落盤
zabbix參照官方文檔 安裝的5.0TLS+CentOS7+Mysql+Nginx版。
zabbix_server (Zabbix) 5.0.5
Revision eaa427cf19 26 October 2020, compilation time: Oct 26 2020 12:20:11
Copyright (C) 2020 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).
Compiled with OpenSSL 1.0.2k-fips 26 Jan 2017
Running with OpenSSL 1.0.2k-fips 26 Jan 2017
PS:本人對zabbix了解不多,只是會安照官方和網上的文檔安裝配置,自己會做一些自定義的監控配置。
問題
zabbix-server不停重啟,登錄頁面也打不開,zabbix-server.log報錯如下:
2148:20210603:143421.801 Starting Zabbix Server. Zabbix 5.0.5 (revision eaa427cf19).
2148:20210603:143421.801 ****** Enabled features ******
2148:20210603:143421.801 SNMP monitoring: YES
2148:20210603:143421.801 IPMI monitoring: YES
2148:20210603:143421.801 Web monitoring: YES
2148:20210603:143421.801 VMware monitoring: YES
2148:20210603:143421.801 SMTP authentication: YES
2148:20210603:143421.801 ODBC: YES
2148:20210603:143421.801 SSH support: YES
2148:20210603:143421.801 IPv6 support: YES
2148:20210603:143421.801 TLS support: YES
2148:20210603:143421.801 ******************************
2148:20210603:143421.801 using configuration file: /etc/zabbix/zabbix_server.conf
...
...
2179:20210603:143423.081 ================================
2179:20210603:143423.081 Please consider attaching a disassembly listing to your bug report.
2179:20210603:143423.081 This listing can be produced with, e.g., objdump -DSswx zabbix_server.
2179:20210603:143423.081 ================================
2148:20210603:143423.082 One child process died (PID:2179,exitcode/signal:1). Exiting ...
zabbix_server [2148]: Error waiting for process with PID 2179: [10] No child processes
2148:20210603:143423.088 syncing history data...
2148:20210603:143423.097 syncing history data... 100.000000%
2148:20210603:143423.097 syncing history data done
2148:20210603:143423.097 syncing trend data...
2148:20210603:143423.102 syncing trend data done
2148:20210603:143423.102 Zabbix Server stopped. Zabbix 5.0.5 (revision eaa427cf19).
處理過程
日志里是沒有體現出內存、緩存、MySQL等問題,於是網上各種檢索。做了各種操作,全套重啟、修改CacheSize、查看子進程鎖死情況、清數據庫。
后面把MySQL直接初始化,發現zabbix-server啟動了幾分鍾,然后又開始無間斷重啟。登錄頁也報錯 Database error Connection timed out,查看zabbix-server.conf沒有問題。然后找官方的安裝文檔,發現zabbix是front、server分離的。。。emmm這個時候好像找到問題了。
檢查前端的配置發現/etc/zabbix/web/zabbix.conf.php下的mysql信息竟然不對???WTF!!!趕緊修改。然后重啟
systemctl stop zabbix-server zabbix-agent rh-nginx116-nginx rh-php72-php-fpm
過了幾分鍾,zabbix-server又開始重啟,然后想到網上的一篇文檔,修改報警媒介類型里mail的配置-安全鏈接:改成STARTTLS(純文本通信協議擴展)。終於恢復了。。。

PS:
使用一些開源軟件的時候,還是要多了解一下軟件本身的架構,維護起來也會更加得心應手。
特別感謝:
https://blog.csdn.net/liuxiangyang_/article/details/100024641
https://yunwei365.blog.csdn.net/article/details/103677447
https://blog.csdn.net/h106140873/article/details/104311586
