1.異常的原因:
(1).DocumentDB重啟導致一段時間服務不可以使用,並且DocumentDB無法實現主備的切換;
(2).statistic_record_service, thirdparty_control,queue_message這三個gen_server由於在訪問DocumentDB的時候沒有做異常的處理,導致監控xxxxxx_sup多次重啟這些服務,一定頻率之后({one_for_one, 10, 10})xxxxxx_sup會停掉所有監控的服務,最終導致整個iot雲端的服務不可使用。
備注:xxxxxx_sup監控下的服務如果在最近的 MaxT 秒內發生的重啟次數超過了 MaxR 次,那么督程會終止所有的子進程,然后結束自己。
參考文檔:https://erldoc.com/doc/otp-design-principles/supervisor.html
(3)關鍵信息:reached_max_restart_intensity
2019-08-02 07:01:27.600 [error] <0.360.0> Supervisor xxxxxx_suphad child queue_message started with octopus_queue_message
:start_link([]) at <0.16789.4084> exit with reason reached_max_restart_intensity in context shutdown
2.服務器中錯誤log的輸出:
(1).queue_message
2019-08-02 07:01:20.771 [error] <0.609.0>@queue_message:terminate:137 _Reason={{case_clause,{error,{connection_failure,{can
t_connect,econnrefused}}}},[{db_helper,fetch_list,1,[{file,"src/model/db_helper.erl"},{line,87}]},{queue_message,resend_off
line_message_internal,2,[{file,"src/octopus_log/octopus_queue_message.erl"},{line,197}]},{queue_message,handle_cast,2,[{fil
e,"src/log/queue_message.erl"},{line,110}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]},{gen_s
erver,handle_msg,6,[{file,"gen_server.erl"},{line,686}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
(2).statistic_record_service
1).由於在函數terminate(_Reason, _State)中沒有輸出日志lager:error("_Reason=~p", [_Reason]),,所以沒有最關鍵的信息;
2).建議在所有使用gen_server的terminate函數都要輸出錯誤日志;
error.log.6:2019-08-02 07:01:20.750 [error] <0.667.0>@db_helper:fetch:132 gen_server statistic_record_service terminated with reaso
n: no case clause matching {error,{connection_failure,{cant_connect,econnrefused}}} in db_helper:fetch/1 line 132
(2).thirdparty_control
1).由於在函數terminate(_Reason, _State)中沒有輸出日志lager:error("_Reason=~p", [_Reason]),,所以沒有最關鍵的信息
2).建議在所有使用gen_server的terminate函數都要輸出錯誤日志;
error.log.6:2019-08-02 07:01:20.750 [error] <0.661.0>@db_helper:fetch:132 CRASH REPORT Process thirdparty_control with 0 neighbours
crashed with reason: no case clause matching {error,{connection_failure,{cant_connect,econnrefused}}} in db_helper:fetch/1 line 132