背景
團隊基於 Armbian 設計了一個 LoRa 網關,它要求上電后開始運行主程序 packet_forwarder (它實現 LoRa<-(轉)->UDP 與服務器通信)。
這本來是一個簡單的需求,將其設計成一個 service 加載到 systemd 中就可以完成,該 rime_gateway.service 代碼如下:
[Unit]
Description=Rime LoRaWAN Gateway
[Service]
WorkingDirectory=/home/rime/packet_forwarder/lora_pkt_fwd
ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh
Restart=always
[Install]
WantedBy=multi-user.target
語法解釋請參考 Systemd 入門教程:命令篇
不穩定的服務
當使用 systemctl start rime_gateway.service 手動啟動時,它工作得很好。
然而,當 Armbian 上電自啟動后,使用 systemctl status rime_gateway.service 查看發現該服務已經停止工作:
rime_gateway.service - Rime LoRaWAN Gateway
Loaded: loaded (/lib/systemd/system/rime_gateway.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2020-04-20 06:51:46 UTC; 29s ago
Process: 1112 ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh (code=exited, status=1/FAILURE)
Main PID: 1112 (code=exited, status=1/FAILURE)
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 5.
Apr 20 06:51:46 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Start request repeated too quickly.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: Failed to start Rime LoRaWAN Gateway.
上面的語句顯示服務重啟太快,系統退出重啟。
使用 journalctl -u rime_gateway.service 查看日志,系統以 100ms 間隔 5 次重啟都失敗。
-- Logs begin at Mon 2020-04-20 06:51:31 UTC, end at Mon 2020-04-20 06:55:01 UTC. --
Apr 20 06:51:40 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 06:51:40 orangepizero start_gateway.sh[572]: Reset start_gateway.sh
Apr 20 06:51:41 orangepizero start_gateway.sh[572]: Starting start_gateway.sh
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 1.
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
Apr 20 06:51:45 orangepizero start_gateway.sh[1112]: Reset start_gateway.sh
Apr 20 06:51:46 orangepizero start_gateway.sh[1112]: Starting start_gateway.sh
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 5.
Apr 20 06:51:46 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Start request repeated too quickly.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: Failed to start Rime LoRaWAN Gateway.
查看網關日志,發現失敗的原因是網絡沒有建立成功 tail -f /tmp/start_gateway.sh.log
ERROR: [up] connect returned Network is unreachable
修改啟動順序
很明顯,該服務依賴於網絡的建立,因此,首先添加如下語句
After=network.target
這個啟動順序生效了嗎?為此,我們導出並查看了啟動順序
systemd-analyze plot > boot.svg
使用 chrome 瀏覽器打開 boot.svg 發現:先啟動 network.target,后啟動 rime_gateway.service

更多啟動順序請參考 Linux systemd啟動守護進程,service啟動順序分析及調整service啟動順序
檢測故障重啟
為了讓服務更健壯,檢測到失敗退出時自動重啟。為此,添加了如下的代碼。
systemd 將嘗試永久重啟服務
StartLimitIntervalSec=0
每隔 1 秒重啟服務是個好主意,以避免在出現問題時對服務器施加太大壓力。
RestartSec=1
更多自動重啟請參考 使用systemd創建Linux服務
穩定的服務
最終的 rime_gateway.service 代碼如下所示
[Unit]
Description=Rime LoRaWAN Gateway
After=network.target
StartLimitIntervalSec=0
[Service]
WorkingDirectory=/home/rime/packet_forwarder/lora_pkt_fwd
ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh
Restart=always
RestartSec=1
[Install]
WantedBy=multi-user.target
使用 systemctl status rime_gateway.service 和 journalctl -u rime_gateway.service 查看日志,服務正常啟動。
在異常的情況下,先拔出網線,再重啟 Armbian,發現 systemd 以每隔 1 秒間隔啟動服務,直到網絡恢復正常為止(本案例重啟 78 次)。
-- Logs begin at Mon 2020-04-20 07:32:09 UTC, end at Mon 2020-04-20 07:35:12 UTC. --
Apr 20 07:32:19 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:32:20 orangepizero start_gateway.sh[839]: Reset start_gateway.sh
Apr 20 07:32:20 orangepizero start_gateway.sh[839]: Starting start_gateway.sh
Apr 20 07:32:20 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 07:32:20 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 07:32:21 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=1s expired, scheduling restart.
Apr 20 07:32:21 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 1.
Apr 20 07:32:21 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 07:32:21 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:32:22 orangepizero start_gateway.sh[991]: Reset start_gateway.sh
Apr 20 07:32:22 orangepizero start_gateway.sh[991]: Starting start_gateway.sh
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
Apr 20 07:34:54 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 07:34:54 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 07:34:55 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=1s expired, scheduling restart.
Apr 20 07:34:55 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 78.
Apr 20 07:34:55 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 07:34:55 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:34:55 orangepizero start_gateway.sh[2644]: Reset start_gateway.sh
Apr 20 07:34:56 orangepizero start_gateway.sh[2644]: Starting start_gateway.sh
