基於 systemd 創建 Linux service 啟動順序和檢測故障重啟


背景

團隊基於 Armbian 設計了一個 LoRa 網關,它要求上電后開始運行主程序 packet_forwarder (它實現 LoRa<-(轉)->UDP 與服務器通信)。
這本來是一個簡單的需求,將其設計成一個 service 加載到 systemd 中就可以完成,該 rime_gateway.service 代碼如下:

[Unit]
Description=Rime LoRaWAN Gateway

[Service]
WorkingDirectory=/home/rime/packet_forwarder/lora_pkt_fwd
ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh
Restart=always

[Install]
WantedBy=multi-user.target

語法解釋請參考 Systemd 入門教程:命令篇

不穩定的服務

當使用 systemctl start rime_gateway.service 手動啟動時,它工作得很好。

然而,當 Armbian 上電自啟動后,使用 systemctl status rime_gateway.service 查看發現該服務已經停止工作:

rime_gateway.service - Rime LoRaWAN Gateway
   Loaded: loaded (/lib/systemd/system/rime_gateway.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2020-04-20 06:51:46 UTC; 29s ago
  Process: 1112 ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh (code=exited, status=1/FAILURE)
 Main PID: 1112 (code=exited, status=1/FAILURE)

Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 5.
Apr 20 06:51:46 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Start request repeated too quickly.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: Failed to start Rime LoRaWAN Gateway.

上面的語句顯示服務重啟太快,系統退出重啟。

使用 journalctl -u rime_gateway.service 查看日志,系統以 100ms 間隔 5 次重啟都失敗。

-- Logs begin at Mon 2020-04-20 06:51:31 UTC, end at Mon 2020-04-20 06:55:01 UTC. --
Apr 20 06:51:40 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 06:51:40 orangepizero start_gateway.sh[572]: Reset start_gateway.sh
Apr 20 06:51:41 orangepizero start_gateway.sh[572]: Starting start_gateway.sh
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 1.

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

Apr 20 06:51:45 orangepizero start_gateway.sh[1112]: Reset start_gateway.sh
Apr 20 06:51:46 orangepizero start_gateway.sh[1112]: Starting start_gateway.sh
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 5.
Apr 20 06:51:46 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Start request repeated too quickly.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: Failed to start Rime LoRaWAN Gateway.

查看網關日志,發現失敗的原因是網絡沒有建立成功 tail -f /tmp/start_gateway.sh.log

ERROR: [up] connect returned Network is unreachable

修改啟動順序

很明顯,該服務依賴於網絡的建立,因此,首先添加如下語句

After=network.target

這個啟動順序生效了嗎?為此,我們導出並查看了啟動順序

systemd-analyze plot > boot.svg

使用 chrome 瀏覽器打開 boot.svg 發現:先啟動 network.target,后啟動 rime_gateway.service

更多啟動順序請參考 Linux systemd啟動守護進程,service啟動順序分析及調整service啟動順序

檢測故障重啟

為了讓服務更健壯,檢測到失敗退出時自動重啟。為此,添加了如下的代碼。

systemd 將嘗試永久重啟服務

StartLimitIntervalSec=0

每隔 1 秒重啟服務是個好主意,以避免在出現問題時對服務器施加太大壓力。

RestartSec=1

更多自動重啟請參考 使用systemd創建Linux服務

穩定的服務

最終的 rime_gateway.service 代碼如下所示

[Unit]
Description=Rime LoRaWAN Gateway
After=network.target
StartLimitIntervalSec=0

[Service]
WorkingDirectory=/home/rime/packet_forwarder/lora_pkt_fwd
ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh
Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

使用 systemctl status rime_gateway.service 和 journalctl -u rime_gateway.service 查看日志,服務正常啟動。

在異常的情況下,先拔出網線,再重啟 Armbian,發現 systemd 以每隔 1 秒間隔啟動服務,直到網絡恢復正常為止(本案例重啟 78 次)。

-- Logs begin at Mon 2020-04-20 07:32:09 UTC, end at Mon 2020-04-20 07:35:12 UTC. --
Apr 20 07:32:19 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:32:20 orangepizero start_gateway.sh[839]: Reset start_gateway.sh
Apr 20 07:32:20 orangepizero start_gateway.sh[839]: Starting start_gateway.sh
Apr 20 07:32:20 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 07:32:20 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 07:32:21 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=1s expired, scheduling restart.
Apr 20 07:32:21 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 1.
Apr 20 07:32:21 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 07:32:21 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:32:22 orangepizero start_gateway.sh[991]: Reset start_gateway.sh
Apr 20 07:32:22 orangepizero start_gateway.sh[991]: Starting start_gateway.sh

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

Apr 20 07:34:54 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 07:34:54 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 07:34:55 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=1s expired, scheduling restart.
Apr 20 07:34:55 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 78.
Apr 20 07:34:55 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 07:34:55 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:34:55 orangepizero start_gateway.sh[2644]: Reset start_gateway.sh
Apr 20 07:34:56 orangepizero start_gateway.sh[2644]: Starting start_gateway.sh


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM