問題描述
上一期的需求上線之后,線上多了一個異常:Connection reset。如下:
[2017-03-22 00:45:00 ERROR] [creativeAuditTaskScheduler_Worker-9] (cn.com.ServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:558) at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:511) at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:248) at cn.com.ServiceImpl.java:136) at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:98) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:262) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:95) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207) at com.sun.proxy.$Proxy28.getAuditResult(Unknown Source) at cn.com.QueryAuditResultTask.queryAuditResult(QueryAuditResultTask.java:117) at cn.com.QueryAuditResultTask.execute(QueryAuditResultTask.java:88) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at sun.security.ssl.InputRecord.readFully(InputRecord.java:442) at sun.security.ssl.InputRecord.read(InputRecord.java:480) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884) at sun.security.ssl.AppInputStream.read(AppInputStream.java:102) at org.apache.http.impl.conn.LoggingInputStream.read(LoggingInputStream.java:87) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87) at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48) at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:52) at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:542) ... 19 more
這里使用Spring RestTemplate調外部接口查詢結果。Spring RestTemplate 配置如下:
<bean id="ky.pollingConnectionManager" class="org.apache.http.impl.conn.PoolingHttpClientConnectionManager"> <!--整個連接池的並發--> <property name="maxTotal" value="1000" /> <!--每個路由的並發--> <property name="defaultMaxPerRoute" value="32" /> </bean> <bean id="ky.httpClientBuilder" class="org.apache.http.impl.client.HttpClientBuilder" factory-method="create"> <property name="connectionManager" ref="ky.pollingConnectionManager" /> <!--開啟重試--> <property name="retryHandler"> <!--新加的異常處理,只處理ConnectTimeoutException和UnknownHostException異常--> <!--上一版本使用的是org.apache.http.impl.client.DefaultHttpRequestRetryHandler--> <!--默認處理InterruptedIOException、UnknownHostException、ConnectException、SSLException 4種異常--> <bean class="cn.com.autodx.common.http.CustomRequestRetryHandler"> <constructor-arg value="3"/> <constructor-arg value="true"/> <constructor-arg value="200"/> </bean> </property> <!--開啟keep-Alive--> <property name="keepAliveStrategy"> <bean class="org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy" /> </property> <property name="defaultHeaders"> <list> <bean class="org.apache.http.message.BasicHeader"> <constructor-arg value="User-Agent"/> <constructor-arg value="Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36"/> </bean> <bean class="org.apache.http.message.BasicHeader"> <constructor-arg value="Accept-Encoding"/> <constructor-arg value="gzip,deflate"/> </bean> <bean class="org.apache.http.message.BasicHeader"> <constructor-arg value="Accept-Language"/> <constructor-arg value="zh-CN"/> </bean> <bean class="org.apache.http.message.BasicHeader"> <constructor-arg value="Connection"/> <constructor-arg value="keep-alive"/> </bean> </list> </property> </bean> <bean id="ky.httpClient" factory-bean="ky.httpClientBuilder" factory-method="build" /> <bean id="ky.clientHttpRequestFactory" class="org.springframework.http.client.HttpComponentsClientHttpRequestFactory"> <constructor-arg ref="ky.httpClient"/> <!--連接超時時間,毫秒--> <property name="connectTimeout" value="2000"/> <!--讀寫超時時間,毫秒--> <property name="readTimeout" value="5000"/> </bean> <bean id="restTemplate" class="org.springframework.web.client.RestTemplate"> <constructor-arg ref="ky.clientHttpRequestFactory"/> <property name="errorHandler"> <bean class="org.springframework.web.client.DefaultResponseErrorHandler"/> </property> <property name="messageConverters"> <list> <bean class="org.springframework.http.converter.FormHttpMessageConverter"/> <bean class="cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter"/> <bean class="org.springframework.http.converter.StringHttpMessageConverter"> <property name="supportedMediaTypes"> <list> <value>text/html;charset=UTF-8</value> <value>application/json</value> </list> </property> </bean> </list> </property> </bean>
本次需求,並沒有修改邏輯,為什么會出現這種情況呢?只是網絡關系,還是跟代碼有關呢。我有幾個疑問:
- 什么情況下會產生Connection reset?
- 長連接中,向server發請求,是先發送數據的,如果連接斷開,應該是寫數據異常,為什么是讀數據異常呢?請求是否發送成功?發送之前有校驗連接是否可用嗎?
- http連接池defaultMaxPerRoute什么意思?每個並發都建立一個長連接嗎?
- Connection reset之后,如何重新建立連接,繼而繼續進行業務交互?
- RestTemplate中配置了重試,為什么沒有重新發起連接?
我們便來解答上面的問題。
問題1,什么情況下會產生Connection reset?
網上搜一下,很多這樣的打包附送的答案,如下:
第1個異常是java.net.BindException:Address already in use: JVM_Bind。該異常發生在服務器端進行new ServerSocket(port)(port是一個0,65536的整型值)操作時。異常的原因是以為與port一樣的一個端口已經被啟動,並進行監聽。此時用netstat –an命令,可以看到一個Listending狀態的端口。只需要找一個沒有被占用的端口就能解決這個問題。
第2個異常是java.net.ConnectException: Connection refused: connect。該異常發生在客戶端進行new Socket(ip, port)操作時,該異常發生的原因是或者具有ip地址的機器不能找到(也就是說從當前機器不存在到指定ip路由),或者是該ip存在,但找不到指定的端口進行監聽。出現該問題,首先檢查客戶端的ip和port是否寫錯了,如果正確則從客戶端ping一下服務器看是否能ping通,如果能ping通(服務服務器端把ping禁掉則需要另外的辦法),則看在服務器端的監聽指定端口的程序是否啟動,這個肯定能解決這個問題。
第3個異常是java.net.SocketException: Socket is closed,該異常在客戶端和服務器均可能發生。異常的原因是己方主動關閉了連接后(調用了Socket的close方法)再對網絡連接進行讀寫操作。
第4個異常是java.net.SocketException: (Connection reset或者Connect reset by peer:Socket write error)。該異常在客戶端和服務器端均有可能發生,引起該異常的原因有兩個,第一個就是如果一端的Socket被關閉(或主動關閉或者因為異常退出而引起的關閉),另一端仍發送數據,發送的第一個數據包引發該異常(Connect reset by peer)。另一個是一端退出,但退出時並未關閉該連接,另一端如果在從連接中讀數據則拋出該異常(Connection reset)。簡單的說就是在連接斷開后的讀和寫操作引起的。
第5個異常是java.net.SocketException: Broken pipe。該異常在客戶端和服務器均有可能發生。在第4個異常的第一種情況中(也就是拋出SocketExcepton:Connect reset by peer:Socket write error后),如果再繼續寫數據則拋出該異常。前兩個異常的解決方法是首先確保程序退出前關閉所有的網絡連接,其次是要檢測對方的關閉連接操作,發現對方關閉連接后自己也要關閉該連接。
這里我們關心的是第四個異常,即server已經關閉了連接,client仍然在從連接中讀數據。
細節剖析
正常流程(成功日志)剖析
接下來,先逐步debug,分析調用成功的日志。
org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)
(org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domainName/creative/getAuditInfo" (org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json] (org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default (org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context
org.apache.http.impl.execchain.MainClientExec.execute
- 獲取ConnectionRequest connManager.requestConnection(route, userToken);
調用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
- 根據ConnectionRequest,獲取HttpClientConnection [調用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection]
獲取連接后,校驗連接不可用,則關閉連接connection。后面判斷connection沒有打開,就會重新建立連接
(org.apache.http.wire:?) - http-outgoing-798 << "end of stream" //isStale返回true (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-798: Close connection (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 799][route: {s}->https://domainName:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]
3.如果managedConn沒有打開, 則建立路由 establishRoute
(org.apache.http.impl.execchain.MainClientExec:?) - Opening connection {s}->https://domainName:443
調用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect
調用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect 遍歷地址集,新建socket,並與connection綁定,成功即返回。 以下是建立連接的過程
(org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connecting to domainName/0.0.0.0:443
調用 sock = sf.connectSocket(connectTimeout, sock, host, remoteAddress, localAddress, context); //建立socket
conn.bind(sock); //綁定socket
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Connecting socket to domainName/0.0.0.0:443 with timeout 2000 (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled protocols: [TLSv1] (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled cipher suites:[] //session.getCipherSuite()
調用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket sslsock.startHandshake();
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Starting handshake
調用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket.verifyHostname(sslsock, host.getHostName()); session可用,打印信息
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Secure session established (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - negotiated protocol: TLSv1 (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - peer principal: CN=*.home.com.cn, OU=IT, O="Beijing Autohome Information Technology Co., Ltd.", L=Beijing, ST=Beijing, C=CN (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - issuer principal: CN=Symantec Class 3 Secure Server CA - G4, OU=Symantec Trust Network, O=Symantec Corporation, C=US
調用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect
(org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connection established localhost:4357<->0.0.0.0:443
- 設置socket超時時間 managedConn.setSocketTimeout(timeout);
(org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-799: set socket timeout to 5000
- 開始處理請求 根據request頭部參數,作相應操作
(org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1 (org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED (org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
- 處理請求 requestExecutor.execute(request, managedConn, context);
調用:org.apache.http.protocol.HttpRequestExecutor.execute
主要功能:
HttpResponse response = doSendRequest(request, conn, context);
if (response == null) { response = doReceiveResponse(request, conn, context); }
6.1 doSendRequest
6.1.1 發送請求頭部
調用org.apache.http.impl.DefaultBHttpClientConnection.sendRequestHeader (確保socket有效,然后向緩存寫請求頭部,寫完后打印以下日志)
(org.apache.http.headers:?) - http-outgoing-799 >> GET /creative/getAuditInfo HTTP/1.1 (org.apache.http.headers:?) - http-outgoing-799 >> Accept: application/json, application/*+json, text/html, application/json (org.apache.http.headers:?) - http-outgoing-799 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36 (org.apache.http.headers:?) - http-outgoing-799 >> Accept-Encoding: gzip,deflate (org.apache.http.headers:?) - http-outgoing-799 >> Accept-Language: zh-CN (org.apache.http.headers:?) - http-outgoing-799 >> Connection: keep-alive (org.apache.http.headers:?) - http-outgoing-799 >> Host: domainName
6.1.2 發送請求頭部之后,如果請求帶有entity,則繼續發送entity,即conn.sendRequestEntity。這里對HTTP 1.0協議做了兼容判斷
6.1.3 通過connection將所有緩存數據發送到服務端,並記錄日志如下: (headers和wire的日志稍有不同。)
(org.apache.http.wire:?) - http-outgoing-799 >> "GET /creative/getAuditInfo HTTP/1.1[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 >> "Accept: application/json, application/*+json, text/html, application/json[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Encoding: gzip,deflate[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Language: zh-CN[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 >> "Connection: keep-alive[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 >> "Host: domainName[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 >> "[\r][\n]"
6.2 doReceiveResponse 如果response為空,則接收服務端的響應
response = conn.receiveResponseHeader();
if (canResponseHaveBody(request, response)) { conn.receiveResponseEntity(response); }
6.2.1 接收響應頭 receiveResponseHeader
--- 讀入數據,頭部和數據體,第一次讀取內容【這里數據包讀了兩次】 << 表示接收數據,從輸入流讀入
(org.apache.http.wire:?) - http-outgoing-799 << "HTTP/1.1 200 OK[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Server: 10.29[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Date: Wed, 22 Mar 2017 01:50:00 GMT[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Content-Type: application/json;charset=UTF-8[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Transfer-Encoding: chunked[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Connection: keep-alive[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Pragma: no-cache[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Cache-Control: no-cache, no-store, max-age=0[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Expires: Thu, 01 Jan 1970 00:00:00 GMT[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "Content-Language: zh-CN[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "6f7[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "{"data":{......... data1
--- 接收完之后,打印日志
(org.apache.http.headers:?) - http-outgoing-799 << HTTP/1.1 200 OK (org.apache.http.headers:?) - http-outgoing-799 << Server: 10.29 (org.apache.http.headers:?) - http-outgoing-799 << Date: Wed, 22 Mar 2017 01:50:00 GMT (org.apache.http.headers:?) - http-outgoing-799 << Content-Type: application/json;charset=UTF-8 (org.apache.http.headers:?) - http-outgoing-799 << Transfer-Encoding: chunked (org.apache.http.headers:?) - http-outgoing-799 << Connection: keep-alive (org.apache.http.headers:?) - http-outgoing-799 << Pragma: no-cache (org.apache.http.headers:?) - http-outgoing-799 << Cache-Control: no-cache, no-store, max-age=0 (org.apache.http.headers:?) - http-outgoing-799 << Expires: Thu, 01 Jan 1970 00:00:00 GMT (org.apache.http.headers:?) - http-outgoing-799 << Content-Language: zh-CN
6.2.2 如果響應含有消息體,則接收消息體
- 檢查reuse策略和keepAlive策略,設置connection屬性。 這里永久保留
(org.apache.http.impl.execchain.MainClientExec:?) - Connection can be kept alive indefinitely
org.springframework.web.client.RestTemplate
- 處理請求response = request.execute(); 日志如上
2.如果響應沒有出錯,則打印以下日志
(org.springframework.web.client.RestTemplate:?) - GET request for "https://domainName/creative/getAuditInfo?" resulted in 200 (OK)
- 讀取擴展字段 extractData
3.1使用HttpMessageConverter讀取並解析數據
調用org.springframework.http.converter.GenericHttpMessageConverter.read方法,從輸入流中讀取數據
(org.springframework.web.client.RestTemplate:?) - Reading [java.lang.String] as "application/json;charset=UTF-8" using [cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter@56a00a64]
3.2如果上一次的數據沒有讀全,這里會接着從輸入流讀取數據
(org.apache.http.wire:?) - http-outgoing-799 << "...... data2" (org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "0[\r][\n]" (org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]"
3.3 數據讀取完畢,就釋放連接
調用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.releaseConnection
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection [id: 799][route: {s}->https://domainName:443] can be kept alive indefinitely (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 799][route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
RestTemplate調用完畢
(cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:?) - getAuditResult .....
總結一下流程:
- 創建連接請求
- 根據連接請求的參數,從連接池中獲取一個連接
- 如果連接沒有打開,則創建一個底層的socket連接。
- 設置socket超時時間
- 發送請求頭部(如果請求中帶有entity,則發送)
- 接收響應(先接收頭部,如果有主體,則接收)
- 讀取擴展數據(使用HttpMessageConverter讀取並解析數據,讀取完成后,關閉輸入流及釋放連接池中的連接)
- 調用完畢,返回數據
本次異常日志剖析
接下來,查看失敗日志:
org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)
(org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domain/creative/getAuditInfo" (org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json] (org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default (org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context
org.apache.http.impl.execchain.MainClientExec.execute
- 獲取ConnectionRequest connManager.requestConnection(route, userToken);
調用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domain:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
2.根據ConnectionRequest,獲取HttpClientConnection
調用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection
其間獲取entry時,校驗connection().isStale()。
public boolean isStale() { if (!isOpen()) { return true; } try { //如果測試結果返回-1說明不可用 final int bytesRead = fillInputBuffer(1); return bytesRead < 0; } catch (final SocketTimeoutException ex) { //注意這里SocketTimeoutException時,認為是可用的 return false; } catch (final IOException ex) { //有I/O異常,不可用 return true; } }
這里讀超時,返回連接可用。於是后面,就沒有關閉連接。也就沒有重新建立新連接。具體參見這篇文章。
(org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Read timed out" (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]
設置socket超時時間 managedConn.setSocketTimeout(timeout);
(org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: set socket timeout to 5000
處理請求,並發送請求數據。處理過程同成功日志中的流程。
(org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1
(org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED
(org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
(org.apache.http.headers:?) - http-outgoing-766 >> GET /creative/getAuditInfo HTTP/1.1
(org.apache.http.headers:?) - http-outgoing-766 >> Accept: application/json, application/*+json, text/html, application/json
(org.apache.http.headers:?) - http-outgoing-766 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36
(org.apache.http.headers:?) - http-outgoing-766 >> Accept-Encoding: gzip,deflate
(org.apache.http.headers:?) - http-outgoing-766 >> Accept-Language: zh-CN
(org.apache.http.headers:?) - http-outgoing-766 >> Connection: keep-alive
(org.apache.http.headers:?) - http-outgoing-766 >> Host: domain
(org.apache.http.wire:?) - http-outgoing-766 >> "GET /creative/getAuditInfo HTTP/1.1[\r][\n]" (org.apache.http.wire:?) - http-outgoing-766 >> "Accept: application/json, application/*+json, text/html, application/json[\r][\n]" (org.apache.http.wire:?) - http-outgoing-766 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[\r][\n]" (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Encoding: gzip,deflate[\r][\n]" (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Language: zh-CN[\r][\n]" (org.apache.http.wire:?) - http-outgoing-766 >> "Connection: keep-alive[\r][\n]" (org.apache.http.wire:?) - http-outgoing-766 >> "Host: domain[\r][\n]" (org.apache.http.wire:?) - http-outgoing-766 >> "[\r][\n]"
接收相應數據時,出現I/O異常,關閉連接,並向上拋出異常。
(org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Connection reset" (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Close connection (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Shutdown connection (org.apache.http.impl.execchain.MainClientExec:?) - Connection discarded (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 0 of 32; total allocated: 0 of 1000] (cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://domain/creative/getAuditInfo":Connection reset; nested exception is java.net.SocketException: Connection reset
對比失敗與成功的日志,發現問題在於獲取connection的時候,校驗connection是否可用的操作上。此時服務器因為不可知的原因斷開了連接(服務端不可以向客戶端發數據),這里應該是沒有按照正常流程進行四次揮手,所以客戶端還保持着連接(可以向服務端發數據,但收不到數據)。測試連接時,客戶端讀超時(必然的),但此時認為連接可用,實際上不可用(不知道這里是不是認為給的1ms探測時間太短了,允許讀超時?),然后就沒有重新建立連接。將錯誤操作延遲到讀取請求這一步。
現在可以回答前面的問題
-
長連接中,向server發請求,是先發送數據的,如果連接斷開,應該是寫數據異常,為什么是讀數據異常呢?請求是否發送成功?發送之前有校驗連接是否可用嗎?
本次異常發生在發送完請求,讀取response的時候,所以是read異常。既然服務端連接斷掉,請求應該是沒有發送成功。發送之前有檢查連接是否可用,然而檢查認為連接可用。 - http連接池defaultMaxPerRoute什么意思?每個並發都建立一個長連接嗎?
為每一個路由建立一個連接池,連接數最大為defaultMaxPerRoute。每一個連接都是一個socket連接。如果配置為長連接,則是長連接。 - Connection reset之后,如何重新建立連接,繼而繼續進行業務交互?
上面的分析,connection reset之后,把有問題的連接關閉掉了,所以,后面不會再使用這個連接,只要重試,一般是可以成功的。 -
RestTemplate中配置了重試,為什么沒有重新發起連接?
在org.apache.http.impl.execchain.RetryExec的execute(調用MainClientExec的execute,但是被RestTemplate的doExecute調用)中,有如下重試處理:
public CloseableHttpResponse execute(final HttpRoute route, final HttpRequestWrapper request, final HttpClientContext context, final HttpExecutionAware execAware) throws IOException, HttpException { ...... //不停重試,停下的判斷在retryHandler.retryRequest for (int execCount = 1;; execCount++) { try { //執行操作 return this.requestExecutor.execute(route, request, context, execAware); } catch (final IOException ex) { ...... //判斷是否符合重試的條件 if (retryHandler.retryRequest(ex, execCount, context)) { //重試的話就打印日志 if (this.log.isInfoEnabled()) { this.log.info("I/O exception ("+ ex.getClass().getName() + ") caught when processing request to " + route + ": " + ex.getMessage()); } //是否可以重試 if (!RequestEntityProxy.isRepeatable(request)) { this.log.debug("Cannot retry non-repeatable request"); throw new NonRepeatableRequestException("Cannot retry request " + "with a non-repeatable request entity", ex); } request.setHeaders(origheaders); } else { //不符合重試條件,就拋出異常 if (ex instanceof NoHttpResponseException) { final NoHttpResponseException updatedex = new NoHttpResponseException( route.getTargetHost().toHostString() + " failed to respond"); updatedex.setStackTrace(ex.getStackTrace()); throw updatedex; } else { throw ex; } } } } }
上一個版本中,沒有connection reset的Error日志,這一個版本就有了。如果說新版本上線之后網絡才有問題,可能性小。所以,以前和現在都會偶爾有網絡問題產生。
如果之前版本有網絡問題,為什么沒有Error日志呢?是否有重試呢?是否通過重試補救了呢?
查看一下以前的info日志,發現的確有重試,也的確補救成功(沒有類似的Connection reset報出)。
//3.11 (org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset //3.12 (org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset
然后查看新版本上線之后的日志,沒有發現重試日志,說明SocketException能被DefaultHttpRequestRetryHandler處理,而不能被CustomRequestRetryHandler處理。在對比兩者的異常處理類型,發現DefaultHttpRequestRetryHandler處理的ConnectException extends SocketException,所以能處理。因為,我們只要在CustomRequestRetryHandler中,加入SocketException或者ConnectException,就可以通過重試解決網絡問題。
至此,我們搞清楚了這個Connection reset的問題,並且找到了解決方案,開心ing~~
補充與參考
幾個重要的類說明:
- MainClientExec
執行流程核心類,execute方法。這里有源碼分析,還不錯。 -
PoolingHttpClientConnectionManager
用來管理connection,方法如:requestConnection(從連接池中lease一個連接,連接的狀態可能是關閉的),connect(真正建立底層socket連接),releaseConnection(釋放連接池中的連接)等。這里有一篇分析。 -
如何檢查httpConnection連接是否有效
方法一,org.apache.http.HttpConnection的isStale()方法
public boolean isStale() { //沒有打開,即socket為空,則不可用 if (!isOpen()) { return true; } try { //socket鏈路有了,測試鏈路是否可用 //這里的測試方法是查看很短的時間內(這里是1ms),是否可以從輸入流中讀到數據 //如果測試結果返回-1說明不可用 final int bytesRead = fillInputBuffer(1); return bytesRead < 0; } catch (final SocketTimeoutException ex) { //注意這里SocketTimeoutException時,認為是可用的 return false; } catch (final IOException ex) { //有I/O異常,不可用 return true; } }
方法二、org.apache.http.HttpClientConnection的isResponseAvailable方法
有幾種實現方法,但大都被棄用。查看org.apache.http.impl.DefaultBHttpClientConnection的實現。
@Override public boolean isResponseAvailable(final int timeout) throws IOException { ensureOpen(); try { return awaitInput(timeout); } catch (final SocketTimeoutException ex) { //這里與isStale不同,SocketTimeoutException時認為不可用 return false; } } protected boolean awaitInput(final int timeout) throws IOException { if (this.inbuffer.hasBufferedData()) { return true; } //BHttpConnectionBase 中的fillInputBuffer方法,與isStale的測試方法一致 fillInputBuffer(timeout); return this.inbuffer.hasBufferedData(); }
比較兩種方法:isResponseAvailable暫時沒有找到使用的地方;isStale在MainClientExec.execute中有直接使用。使用的話需要配置staleConnectionCheckEnabled,即每次請求都要去檢測(最高耗時30ms)一次,看起來有一點悲觀鎖的意思,對性能影響比較大。所以4.4版本開始,默認值為false,且功能被標識為過時。這種方法不推薦。
if (config.isStaleConnectionCheckEnabled()) { // validate connection if (managedConn.isOpen()) { this.log.debug("Stale connection check"); if (managedConn.isStale()) { this.log.debug("Stale connection detected"); managedConn.close(); } } }
方法三:官方推薦的org.apache.http.impl.conn.PoolingHttpClientConnectionManager#getValidateAfterInactivity()
對非活動的永久連接,每個validateAfterInactivity毫秒(默認2s)做一次鏈路檢查,盡量確保在使用的時候是可用的。為什么說盡量呢?本文就是一個例子,2s檢查沒有問題,但在使用之前的2s內網絡出了問題,這就沒有辦法了。
validateAfterInactivity的使用,在org.apache.http.pool.AbstractConnPool中,getPoolEntryBlocking方法。
if (entry.isExpired(System.currentTimeMillis())) { entry.close(); } else if (this.validateAfterInactivity > 0) { if (entry.getUpdated() + this.validateAfterInactivity <= System.currentTimeMillis()) { if (!validate(entry)) { entry.close(); } } }
詳細解釋在這里。
https://www.cnblogs.com/shoren/p/httpclient-connectionreset.html