產生worker timeout 的背景
while 1:
.....
time.sleep(1)
gunicorn運行起來,只等待了30s,就卡住了,沒報任何異常或err,查了gunicorn 官方文檔,原來是線程默認等待30s 就kill 掉,再restart
http://docs.gunicorn.org/en/stable/settings.html
timeout
-t INT, --timeout INT
30
Workers silent for more than this many seconds are killed and restarted.
Generally set to thirty seconds. Only set this noticeably higher if you’re sure of the repercussions for sync workers. For the non sync workers it just means that the worker process is still communicating and is not tied to the length of time required to handle a single request.
根本原因找到了,在gunicorn啟動加了--timeout 120 ,還是超過30s 就worker timeout.搜了一圈stack沒發現好的解決方法。
解決這個問題,目前最好的方法,就是在程序改代碼,原先是主線程調用,用threading包裝一下
如:
import threading
t = threading.Thread(name = '', target = func ,kwargs{})
t.daemon = True
t.start()
t = threading.Thread(name='result_package', target=result_package, args=(pack_name, task, issue))
t.daemon = True t.start()
這樣就在主線程下,把方法包裝起來。
順便用
Event().wait(15) 替代 time.sleep(16)
這樣寫法的好處是不占用cpu,釋放!
剛開始,分析原因花了不少時間,幾行代碼就把worker timeout解決了。之前試了map.thread不行。
准備用隊列(celery+redis)替代原來的邏輯,只是工作量有點大,太重了。