1,解決的問題:ConcurrentWebSocketSessionDecorator The remote endpoint was in state [TEXT_PARTIAL_WRITING],
注:websocket sendMessage 發送基本邏輯: 在發送消息前,會校驗一下state,如果是State.OPEN或 State.TEXT_PARTIAL_READY,則可以發送,並把stat設置為TEXT_PARTIAL_WRITING,否則報錯。等發送完成,則又設置state為open,如此往復
2,現象
2020-10-14 11:16:08 880 [,] [WsClientSendThreadPool-9] WebsocketUtil.java 119 webSocketSendMessage ERROR c.e.wsgate.util.WebsocketUtil - userSessionInfo:sessionId=42aa19eb-65d3-e472-b244-6c59ac9479ad,room=3,uid=2,localAddr=[10.3.246.135:8081],remoteAddr=10.2.45.220:65057, WebSocketSession sendMessage err={} java.lang.IllegalStateException: The remote endpoint was in state [TEXT_PARTIAL_WRITING] which is an invalid state for called method at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.checkState(WsRemoteEndpointImplBase.java:1229) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.textPartialStart(WsRemoteEndpointImplBase.java:1186) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:222) at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49) at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:215) at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:106) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.tryFlushMessageBuffer(ConcurrentWebSocketSessionDecorator.java:171) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.sendMessage(ConcurrentWebSocketSessionDecorator.java:144) at com.example.wsgate.util.WebsocketUtil.webSocketSendMessage(WebsocketUtil.java:110) at com.example.wsgate.util.RoomStockUser.sendToAllUserClient(RoomStockUser.java:310) at com.example.wsgate.util.RoomStockUser.sendToClient(RoomStockUser.java:358) at com.example.wsgate.config.WsClientSendThreadPool.lambda$sendWorker$0(WsClientSendThreadPool.java:36) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
3,問題描述:在100QPS壓測下,在瀏覽器中創建一個websocket用戶用以觀察數據情況,最后在日志中發現頻繁報錯:java.lang.IllegalStateException: The remote endpoint was in state [TEXT_PARTIAL_WRITING] which is an invalid state for called method, 但是用java代碼創建的200個websocket鏈接正常
4,發送邏輯:
if (webSocketSession.isOpen()) { webSocketSession.sendMessage(textMessage); }
5,疑問
5.1 疑問1:webSocketSession.isOpen()是判斷了stat狀態了,為什么還會有這個state的異常?
釋疑:看代碼 org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.checkState(WsRemoteEndpointImplBase.java:1229),發現這個state非彼state,isOpen判斷的是websocket下的是state,感覺更多是指服務器端的websocket鏈接狀態
而報錯的state 是WsRemoteEndpointImplBase StateMachine 下的state,是服務器端用來標注和客戶端交互的一個狀態,跟多的是表示當前客戶端鏈接的狀態
5.2 疑問2:發送線程是單線程,且是支持並發的ConcurrentWebSocketSessionDecorator,為什么還會出現這樣的異常呢?
釋疑:如下先展示兩段核心代碼
ConcurrentWebSocketSessionDecorator:這是一個線程安全的
private boolean tryFlushMessageBuffer() throws IOException { if (this.flushLock.tryLock()) {//獲取鎖,獲取即執行發送,沒獲取,就算了,等下次再問一下,反正數據已經存入到消息隊列中了,和lock有區別 try { while (true) { WebSocketMessage<?> message = this.buffer.poll(); if (message == null || shouldNotSend()) { break; } this.bufferSize.addAndGet(-message.getPayloadLength()); this.sendStartTime = System.currentTimeMillis(); getDelegate().sendMessage(message);//用來發送消息的 this.sendStartTime = 0; } } finally { this.sendStartTime = 0; this.flushLock.unlock(); } return true; } return false; }
以及根據getDelegate().sendMessage(message)跟到的
WsRemoteEndpointImplBase:這個是在發送數據之前,檢查state,並設置state為TEXT_PARTIAL_WRITING
public synchronized void textPartialStart() { checkState(State.OPEN, State.TEXT_PARTIAL_READY); state = State.TEXT_PARTIAL_WRITING; }
WsRemoteEndpointImplBase:這一段應該是一個阻塞式
void sendMessageBlock(CharBuffer part, boolean last) throws IOException { long timeoutExpiry = getTimeoutExpiry(); boolean isDone = false; while (!isDone) { encoderBuffer.clear(); CoderResult cr = encoder.encode(part, encoderBuffer, true); if (cr.isError()) { throw new IllegalArgumentException(cr.toString()); } isDone = !cr.isOverflow(); encoderBuffer.flip(); sendMessageBlock(Constants.OPCODE_TEXT, encoderBuffer, last && isDone, timeoutExpiry);//發送消息 } stateMachine.complete(last);//發送完畢,將state設置為open }
從代碼上思考,想不出來有什么問題。
最后發現一個淹沒在 java.lang.IllegalStateException中的一個java.lang.NullPointerException: null異常,如下
2020-10-14 14:06:18 495 [,] [WsClientSendThreadPool-11] WebsocketUtil.java 121 webSocketSendMessage ERROR c.e.wsgate.util.WebsocketUtil - userSessionInfo:sessionId=cde27712-c591-f3e9-104b-1b4aa962f24e,room=3,uid=1,localAddr=[10.3.246.134:8081],remoteAddr=10.2.45.220:58031,type=true,threadId=199,threadName=WsClientSendThreadPool-11, WebSocketSession sendMessage err={} java.lang.NullPointerException: null at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:310) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250) //就是上一個方法中的發送消息的方法 at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223) at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49) at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:215) at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:106) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.tryFlushMessageBuffer(ConcurrentWebSocketSessionDecorator.java:171) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.sendMessage(ConcurrentWebSocketSessionDecorator.java:144) at com.example.wsgate.util.WebsocketUtil.webSocketSendMessage(WebsocketUtil.java:110) at com.example.wsgate.util.RoomStockUser.sendToAllUserClient(RoomStockUser.java:314) at com.example.wsgate.util.RoomStockUser.sendToClient(RoomStockUser.java:362) at com.example.wsgate.config.WsClientSendThreadPool.lambda$sendWorker$0(WsClientSendThreadPool.java:36) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
根據錯誤棧,找到:!bsh.getSendResult().isOK() 這一句報空指針異常
//SendResult:The result of asynchronously sending a web socket message. A SendResult is either ok indicating there was no problem, or is not OK in which case there was a problem and it carries an exception to indicate what the problem was
private void sendMessageBlock(byte opCode, ByteBuffer payload, boolean last, long timeoutExpiry) throws IOException { wsSession.updateLastActive(); ... for (MessagePart mp : messageParts) { writeMessagePart(mp); if (!bsh.getSendResult().isOK()) { // messagePartInProgress.release(); Throwable t = bsh.getSendResult().getException(); wsSession.doClose(new CloseReason(CloseCodes.GOING_AWAY, t.getMessage()), new CloseReason(CloseCodes.CLOSED_ABNORMALLY, t.getMessage())); throw new IOException (t); } // The BlockingSendHandler doesn't call end message so update the // flags. fragmented = nextFragmented; text = nextText; } if (payload != null) { payload.clear(); } endMessage(null, null); }
6,異常現象的結論就是:
1,服務器端給客戶端發送數據,出現了某種問題(可能是底層的一個實現邏輯,根據api的解釋,主要標注發送消息是否成功或失敗,參照:SendResult),報錯。導致WsRemoteEndpointImplBase 下sendMessageBlock函數中stateMachine.complete(last)(將state設置為open)沒法執行,使state一直為TEXT_PARTIAL_WRITING,
后來的sender線程在調用這個websocket時,就一直報錯,
7,總的結論:
1,是tomcat的一個bug,Bug 64061 ,參照https://bz.apache.org/bugzilla/show_bug.cgi?id=64061
8,解決方案
1,在pom中指定新的tomcat版本
<properties> <tomcat.version>9.0.30</tomcat.version> </properties> <dependency> <groupId>org.apache.tomcat</groupId> <artifactId>tomcat-juli</artifactId> <version>${tomcat.version}</version> </dependency>
2,在sendMessage catch IllegalStateException 並close條這個鏈接,使客戶端重連
9,帶來的問題:
tomcat升級之后,同樣壓測,會有java.io.IOException: java.net.SocketTimeoutException 導致的 websocket handleTransportError,觸發退群邏輯,關閉掉鏈接,客戶端會再重連
2020-10-14 17:37:04 763 [,] [WsClientSendThreadPool-16] LiveRoomWebSocketHandler.java 166 handleTransportError ERROR c.e.w.h.LiveRoomWebSocketHandler - handleTransportError. sessionId=eb92bb44-0d6d-3882-2f95-63030d49b10a,localAddr=[10.3.246.134:8081],remoteAddr=10.2.45.220:52145,group=3,uid=1,joinedRoom=true,exception={} java.io.IOException: java.net.SocketTimeoutException at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:258) at org.apache.tomcat.websocket.WsSession.sendCloseMessage(WsSession.java:612) at org.apache.tomcat.websocket.WsSession.doClose(WsSession.java:497) at org.apache.tomcat.websocket.WsSession.doClose(WsSession.java:459) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:313) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223) at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49) at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:215) at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:106) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.tryFlushMessageBuffer(ConcurrentWebSocketSessionDecorator.java:171) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.sendMessage(ConcurrentWebSocketSessionDecorator.java:144) at com.example.wsgate.util.WebsocketUtil.webSocketSendMessage(WebsocketUtil.java:110) at com.example.wsgate.util.RoomStockUser.sendToAllUserClient(RoomStockUser.java:314) at com.example.wsgate.util.RoomStockUser.sendToClient(RoomStockUser.java:362) at com.example.wsgate.config.WsClientSendThreadPool.lambda$sendWorker$0(WsClientSendThreadPool.java:36) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: null at org.apache.tomcat.util.net.NioEndpoint$Poller.timeout(NioEndpoint.java:977) at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:753) ... 1 common frames omitted 2020-10-14 17:37:04 772 [,] [WsClientSendThreadPool-16] WebsocketUtil.java 123 webSocketSendMessage ERROR c.e.wsgate.util.WebsocketUtil - userSessionInfo:sessionId=eb92bb44-0d6d-3882-2f95-63030d49b10a,room=3,uid=1,localAddr=[10.3.246.134:8081],remoteAddr=10.2.45.220:52145,threadId=154,threadName=WsClientSendThreadPool-16, WebSocketSession sendMessage err={} java.io.IOException: java.net.SocketTimeoutException at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223) at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49) at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:215) at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:106) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.tryFlushMessageBuffer(ConcurrentWebSocketSessionDecorator.java:171) at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.sendMessage(ConcurrentWebSocketSessionDecorator.java:144) at com.example.wsgate.util.WebsocketUtil.webSocketSendMessage(WebsocketUtil.java:110) at com.example.wsgate.util.RoomStockUser.sendToAllUserClient(RoomStockUser.java:314) at com.example.wsgate.util.RoomStockUser.sendToClient(RoomStockUser.java:362) at com.example.wsgate.config.WsClientSendThreadPool.lambda$sendWorker$0(WsClientSendThreadPool.java:36) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: null at org.apache.tomcat.util.net.SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1435) at org.apache.tomcat.util.net.SocketWrapperBase.write(SocketWrapperBase.java:1353) at org.apache.tomcat.util.net.SocketWrapperBase.write(SocketWrapperBase.java:1324) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:90) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:499) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java
10,思考:
1,用java代碼創建了200個鏈接,一個瀏覽器鏈接,為什么唯獨瀏覽器這個鏈接會有問題呢?瀏覽器這個在交互中有什么不同,會觸發這個問題。