背景描述:prometheus服務器總是出現兩個小時內斷開外部連接,導致prometheus和grafana提供的web服務無法訪問,ssh工具連不上機器,故選擇了重啟實例,可是在重啟實例后再次出現此般狀況,
故對服務器系統進行排查,經排查后定位到實例的網卡出現了以下問題:
[root@prometheus /var/log]# systemctl status network ● network.service - LSB: Bring up/down networking Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled) Active: failed (Result: exit-code) since 二 2021-08-17 18:02:07 CST; 7h left Docs: man:systemd-sysv-generator(8) Process: 1172 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE) CGroup: /system.slice/network.service └─1359 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--ens5.lease -pf /var/run/dhclient-ens5.pid -H prometheus ens5 8月 17 18:02:05 prometheus dhclient[1301]: DHCPACK from 172.31.32.1 (xid=0x41460a8e) 8月 17 18:02:07 prometheus dhclient[1301]: bound to 172.31.44.100 -- renewal in 1762 seconds. 8月 17 18:02:07 prometheus network[1172]: Determining IP information for ens5... done. 8月 17 18:02:07 prometheus network[1172]: [ OK ] 8月 17 18:02:07 prometheus network[1172]: Bringing up interface eth0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device eth0 does not seem to be present, delaying initialization. 8月 17 18:02:07 prometheus network[1172]: [FAILED] 8月 17 18:02:07 prometheus systemd[1]: network.service: control process exited, code=exited status=1 8月 17 18:02:07 prometheus systemd[1]: Failed to start LSB: Bring up/down networking. 8月 17 18:02:07 prometheus systemd[1]: Unit network.service entered failed state. 8月 17 18:02:07 prometheus systemd[1]: network.service failed. 第一次發現問題后經過重啟網絡發現如下: [root@prometheus /var/log]# systemctl restart network Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details. [root@prometheus /var/log]# systemctl status network ● network.service - LSB: Bring up/down networking Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled) Active: failed (Result: exit-code) since 二 2021-08-17 10:13:20 CST; 9s ago Docs: man:systemd-sysv-generator(8) Process: 11979 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE) CGroup: /system.slice/network.service └─1359 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--ens5.lease -pf /var/run/dhclient-ens5.pid -H prometheus ens5 8月 17 10:13:20 prometheus network[11979]: RTNETLINK answers: File exists 8月 17 10:13:20 prometheus network[11979]: RTNETLINK answers: File exists 8月 17 10:13:20 prometheus network[11979]: RTNETLINK answers: File exists 8月 17 10:13:20 prometheus network[11979]: RTNETLINK answers: File exists 8月 17 10:13:20 prometheus network[11979]: RTNETLINK answers: File exists 8月 17 10:13:20 prometheus network[11979]: RTNETLINK answers: File exists 8月 17 10:13:20 prometheus systemd[1]: network.service: control process exited, code=exited status=1 8月 17 10:13:20 prometheus systemd[1]: Failed to start LSB: Bring up/down networking. 8月 17 10:13:20 prometheus systemd[1]: Unit network.service entered failed state. 8月 17 10:13:20 prometheus systemd[1]: network.service failed.
故在網上尋找了有關"Device eth0 does not seem to be present, delaying initialization","RTNETLINK answers: File exists"兩個問題的解決;
故在網上尋找了有關"Device eth0 does not seem to be present, delaying initialization","RTNETLINK answers: File exists"兩個問題的解決; 網上的方法大致如下: 第一種:和 NetworkManager 服務有沖突,直接關閉 NetworkManger 服務, service NetworkManager stop,並且禁止開機啟動 chkconfig NetworkManager off 。之后重啟。(我嘗試了,發現機器里並沒有NetworkManager服務,故不可) 第二種:和配置文件的MAC地址不匹配,修改 /etc/udev/rules.d/70-persistent-net.rules文件的MAC地址和 /etc/sysconfig/network-scripts/ifcfg-eth5一樣。(我嘗試了,aws實例中並沒有網卡配置並沒有MAC地址,故不可) 第三種:ip addr flush dev eth5(嘗試后,未能解決)。
問題解決:
在發現主機ip綁定的是ens5這張網卡,而報錯中出現"Device eth0 does not seem to be present" 推測可能是aws中eth0這張網卡影響了eth5網卡的啟動,后經操作重啟網卡成功;操作如下: [root@prometheus /etc/sysconfig/network-scripts]# mv ifcfg-eth0 /root [root@prometheus /etc/sysconfig/network-scripts]# kill -9 1359 [root@prometheus /etc/sysconfig/network-scripts]# rm -rf /var/lib/dhclient/dhclient--ens5.lease /var/run/dhclient-ens5.pid /etc/udev/rules.d/70-persistent-net.rules [root@prometheus /etc/sysconfig/network-scripts]# systemctl restart network(這里已經重啟成功) [root@prometheus /etc/sysconfig/network-scripts]# systemctl status network ● network.service - LSB: Bring up/down networking Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled) Active: active (running) since 二 2021-08-17 11:34:47 CST; 8min ago Docs: man:systemd-sysv-generator(8) Process: 25605 ExecStop=/etc/rc.d/init.d/network stop (code=exited, status=0/SUCCESS) Process: 25782 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS) CGroup: /system.slice/network.service └─25966 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--ens5.lease -pf /var/run/dhclient-ens5.pid -H prometheus ens5 8月 17 11:34:45 prometheus systemd[1]: Starting LSB: Bring up/down networking... 8月 17 11:34:45 prometheus network[25782]: Bringing up loopback interface: [ OK ] 8月 17 11:34:45 prometheus network[25782]: Bringing up interface ens5: 8月 17 11:34:45 prometheus dhclient[25911]: DHCPREQUEST on ens5 to 255.255.255.255 port 67 (xid=0x3567fc41) 8月 17 11:34:45 prometheus dhclient[25911]: DHCPACK from 172.31.32.1 (xid=0x3567fc41) 8月 17 11:34:47 prometheus dhclient[25911]: bound to 172.31.44.100 -- renewal in 1363 seconds. 8月 17 11:34:47 prometheus network[25782]: Determining IP information for ens5... done. 8月 17 11:34:47 prometheus network[25782]: [ OK ] 8月 17 11:34:47 prometheus systemd[1]: Started LSB: Bring up/down networking.