nginx 的 recv() failed (104: Connection reset by peer)問題解決
先講一下遇到這個問題的經歷,踩過的坑吧。
因為公司業務需要,搭建了負載均衡架構,搭建之后發現網站頁面偶爾出現500,去分析日志,在后端真實服務器中沒有發現問題。
由於該系統不完全是由公司的研發所研發的,是先夠買的一套系統,然后再開發的,所以牽扯的比較多,去看debug日志,發現頁面出現500的時候,debug日志中會報訪問接口超時,但這個報接口超時是訪問這個系統的供應商的接口,一開始以為是系統供應商的問題,因為報請求他們的幾口超時,去找人家,人家說人家的沒問題。換思路吧!
debug顯示是超時,那我就把nginx里的,php里的所有超時時間都修改了,調的很大,之前以為500s已經很大了,但是,沒有解決問題,於是看了好多文檔,盡可能的將自己的超時時間設置的很大,如下:
-
fastcgi_buffer_size 128k;
-
fastcgi_buffers 4 128k;
-
fastcgi_busy_buffers_size 256k;
-
fastcgi_connect_timeout 600;
-
fastcgi_send_timeout 600;
-
fastcgi_read_timeout 600;
-
-
proxy_buffers 4 128k;
-
proxy_busy_buffers_size 128k;
-
proxy_connect_timeout 600s;
-
proxy_read_timeout 1200;
-
proxy_send_timeout 1200;
-
keepalive_timeout 65s;
-
client_header_timeout 120s;
-
client_body_timeout 120s;
-
send_timeout 30s;
但是,頁面還是會有500的時候。很奇怪,錯誤日志里沒有任何報錯信息,去查看php-fpm的錯誤日志,偶爾會出現busy的情況,如下圖:
WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 9 idle, and 62 total children
這個是因為php-fpm的配置文件里設置的進程數不夠,將進程數調大就沒有問題了。主要參數有
-
pm.max_children
-
pm.start_servers
-
pm.min_spare_servers
-
pm.max_spare_servers
可以根據自己服務器的性能來計算設置多少值合適,修改完之后php錯誤日志不報了,可是頁面500的問題還是沒有解決,好鬧心啊!
繼續分析,想着是不是負載均衡的原因,然后將我的負載均衡架構直接撤了,試了試不用任何架構,直接走單個的web服務器,撤了觀察了一天一夜,還是有這種情況,排除是架構的問題,又上了負載均衡,這次上了之后,突然想起來,負載均衡調度器也記錄了錯誤日志,去查看負載均衡的錯誤日志,哇!全是一個錯!如下:
-
2019/06/19 07:10:53 [error] 6744#0: *7779178 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 104.206.96.10, server: c.bailitop.com, request: "GET /login HTTP/1.1", upstream: "http:/****/login", host: "c.bailitop.com", referrer: "http://c.bailitop.com/my/courses/learning"
-
2019/06/19 07:11:09 [error] 6746#0: *7779311 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 157.55.39.99, server: c.bailitop.com, request: "GET /user/3417/learn HTTP/1.1", upstream: "http://****/user/3417/learn", host: "c.bailitop.com"
-
2019/06/19 07:11:29 [error] 6747#0: *7779428 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 223.72.77.219, server: c.bailitop.com, request: "GET /my/course/1337 HTTP/1.1", upstream: "http://****/my/course/1337", host: "c.bailitop.com", referrer: "http://c.bailitop.com/"
-
2019/06/19 07:11:29 [error] 6742#0: *7744356 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 106.39.2.230, server: c.bailitop.com, request: "GET /course/1414/activity/50238/live_trigger?eventName=doing&data%5BlastTime%5D=1560859104&data%5Bevents%5D%5Bwatching%5D%5BwatchTime%5D=60 HTTP/1.1", upstream: "http://****/course/1414/activity/50238/live_trigger?eventName=doing&data%5BlastTime%5D=1560859104&data%5Bevents%5D%5Bwatching%5D%5BwatchTime%5D=60", host: "c.bailitop.com", referrer: "http://c.bailitop.com/course/1414/activity/50238/live_entry"
-
2019/06/19 07:11:58 [error] 6746#0: *7779597 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 40.77.167.145, server: c.bailitop.com, request: "GET /user/9470/teach HTTP/1.1", upstream: "http://****/user/9470/teach", host: "c.bailitop.com"
-
2019/06/19 07:12:13 [error] 6741#0: *7779646 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 40.77.167.145, server: c.bailitop.com, request: "GET /course/explore/djt?subCategory=mylg1&orderBy=latest HTTP/1.1", upstream: "http://****/course/explore/djt?subCategory=mylg1&orderBy=latest", host: "c.bailitop.com"
-
2019/06/19 07:12:28 [error] 6742#0: *7779697 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 104.206.96.10, server: c.bailitop.com, request: "GET /register?goto=%2Fcourse%2Fexplore%2Fact%3Ffilter%255BcurrentLevelId%255D%3Dall%26filter%255Bprice%255D%3Dall%26filter%255Btype%255D%3Dall%26orderBy%3DrecommendedSeq%26page%3D8%26tag%255Btags%255D%255B8%255D%3D34 HTTP/1.1", upstream: "http://****/register?goto=%2Fcourse%2Fexplore%2Fact%3Ffilter%255BcurrentLevelId%255D%3Dall%26filter%255Bprice%255D%3Dall%26filter%255Btype%255D%3Dall%26orderBy%3DrecommendedSeq%26page%3D8%26tag%255Btags%255D%255B8%255D%3D34", host: "c.bailitop.com", referrer: "http://www.bailitop.com"
-
2019/06/19 07:12:29 [error] 6742#0: *7779705 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 106.39.2.230, server: c.bailitop.com, request: "GET /course/1414/activity/50238/live_trigger?eventName=doing&data%5BlastTime%5D=1560859104&data%5Bevents%5D%5Bwatching%5D%5BwatchTime%5D=60 HTTP/1.1", upstream: "http://****/course/1414/activity/50238/live_trigger?eventName=doing&data%5BlastTime%5D=1560859104&data%5Bevents%5D%5Bwatching%5D%5BwatchTime%5D=60", host: "c.bailitop.com", referrer: "http://c.bailitop.com/course/1414/activity/50238/live_entry"
-
2019/06/19 07:13:29 [error] 6742#0: *7779705 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 106.39.2.230, server: c.bailitop.com, request: "GET /course/1414/activity/50238/live_trigger?eventName=doing&data%5BlastTime%5D=1560859104&data%5Bevents%5D%5Bwatching%5D%5BwatchTime%5D=60 HTTP/1.1", upstream: "http://****/course/1414/activity/50238/live_trigger?eventName=doing&data%5BlastTime%5D=1560859104&data%5Bevents%5D%5Bwatching%5D%5BwatchTime%5D=60", host: "c.bailitop.com", referrer: "http://c.bailitop.com/course/1414/activity/50238/live_entry"
-
2019/06/19 07:13:32 [error] 6742#0: *7779959 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 14.116.141.249, server: c.bailitop.com, request: "GET /files/course/2019/03-01/1717233a72f9696815.jpg HTTP/1.1", upstream: "http://****/files/course/2019/03-01/1717233a72f9696815.jpg", host: "c.bailitop.com", referrer: "http://usa.bailitop.com/topics/20120712/8653.html"
終於看到錯誤信息了,這個錯誤問了一下度娘,好多說法如下:
解決辦法全都是這個,按照這個方法將php-fpm配置文件中的request_terminate_timeout修改成了0,重啟了php-fpm,問題沒有解決,還是有這個報錯信息,最后確定問題原因是,buffers的值設置太小的原因,最后的解決辦法是:
將nginx的配置文件里的buffers的值設置大一些,下面對比一下修改之前和修改之后的:
修改之前:
-
client_max_body_size 1024m;
-
client_body_buffer_size 512k;
-
client_header_buffer_size 512k;
-
proxy_buffers 4 64k;
-
proxy_busy_buffers_size 64k;
修改之后:
-
client_max_body_size 1024m;
-
client_body_buffer_size 10m;
-
client_header_buffer_size 10m;
-
proxy_buffers 4 128k;
-
proxy_busy_buffers_size 128k;
將參數調大之后,動態監控錯誤日志,recv() failed (104: Connection reset by peer)錯誤不再出現了,也沒有發現頁面報500了,好了,問題解決了。這個問題解決了好久,一直解決的方向不對,所以吸取經驗,以后希望能夠對症下葯,一針見血,不會走太多的彎路。
另外出現這個問題的原因,也有可能是后端真實服務器和負載均衡調度器之間端口沒有放行,比如后端真實服務器的監聽端口是8091,而防火牆沒有放行8091端口,也會導致這種錯誤