Tomcat請求解析-請求行和請求頭

本文轉載自查看原文 2020-06-09 19:48 918 tomcat/ JAVA IO

一、前言

文章：https://www.cnblogs.com/runnable/p/12905401.html中介紹了Tomcat處理一次請求的大致過程，其中包括請求接收、請求數據處理以及請求響應。接下來用兩篇文章詳細分析請求數據解析：請求行和請求頭的讀取、請求體的讀取。

在分析請求數據處理之前，再次回顧一下2個概念

1、Tomcat中用於讀取socket數據的緩沖區buf。它是一個字節數組，默認長度8KB。有2個重要的位置下標：pos和lastValid，pos標記下次讀取位置，lastValid標記有效數據最后位置。

圖中4種情況分別對應：初始數組；剛從操作系統中讀取數據到buf；Tomcat解析過程中，已經讀取第一位字節；本次從操作系統讀取的數據已經全部解析完。

Tomcat中對請求數據的處理，其實就是重復這四個這個過程，把數據從操作系統讀取到Tomcat緩存，然后逐個字節進行解析。我們后面詳細分析。

2、字節塊(ByteChunk)，一種數據結構。有三個重要屬性：字節數組buff，start，end。我們從三個屬性可以看出，字節塊是利用兩個下標，標記了一個字節數組中的一段字節。在數據被使用時才把標記的字節轉換成字符串，且相同的字節段，如果已經有字符串對應，則會共用該字符串。這樣做最大的好處是提高效率、減少內存使用。如下圖標記了字節塊下標1-4的字節。

3、HTTP請求數據格式如下

整個請求數據的解析過程實際就是根據HTTP規范逐個字節分析，最終轉換成請求對象的過程，因此有必要對HTTP格式有了解

下面我們進入主題，通過源碼分析請求行和請求頭的解析過程

首先進入HTTP11處理器中處理請求的入口：

  1 @Override
  2     public SocketState process(SocketWrapper<S> socketWrapper)
  3         throws IOException {
  4         RequestInfo rp = request.getRequestProcessor();
  5         rp.setStage(org.apache.coyote.Constants.STAGE_PARSE);
  6 
  7         // Setting up the I/O
  8         setSocketWrapper(socketWrapper);
  9         /**
 10          * 設置socket的InputStream和OutStream，供后面讀取數據和響應使用
 11          */
 12         getInputBuffer().init(socketWrapper, endpoint);
 13         getOutputBuffer().init(socketWrapper, endpoint);
 14 
 15         // Flags
 16         keepAlive = true;
 17         comet = false;
 18         openSocket = false;
 19         sendfileInProgress = false;
 20         readComplete = true;
 21         if (endpoint.getUsePolling()) {
 22             keptAlive = false;
 23         } else {
 24             keptAlive = socketWrapper.isKeptAlive();
 25         }
 26 
 27         /**
 28          * 長連接相關，判斷當前socket是否繼續處理接下來的請求
 29          */
 30         if (disableKeepAlive()) {
 31             socketWrapper.setKeepAliveLeft(0);
 32         }
 33 
 34         /**
 35          * 處理socket中的請求，在長連接的模式下，每次循環表示一個HTTP請求
 36          */
 37         while (!getErrorState().isError() && keepAlive && !comet && !isAsync() &&
 38                 upgradeInbound == null &&
 39                 httpUpgradeHandler == null && !endpoint.isPaused()) {
 40 
 41             // Parsing the request header
 42             try {
 43                 /**
 44                  * 1、設置socket超時時間
 45                  * 2、第一次從socket中讀取數據
 46                  */
 47                 setRequestLineReadTimeout();
 48 
 49                 /**
 50                  * 讀取請求行
 51                  */
 52                 if (!getInputBuffer().parseRequestLine(keptAlive)) {
 53                     if (handleIncompleteRequestLineRead()) {
 54                         break;
 55                     }
 56                 }
 57 
 58                 // Process the Protocol component of the request line
 59                 // Need to know if this is an HTTP 0.9 request before trying to
 60                 // parse headers.
 61                 prepareRequestProtocol();
 62 
 63                 if (endpoint.isPaused()) {
 64                     // 503 - Service unavailable
 65                     response.setStatus(503);
 66                     setErrorState(ErrorState.CLOSE_CLEAN, null);
 67                 } else {
 68                     keptAlive = true;
 69                     // Set this every time in case limit has been changed via JMX
 70                     // 設置請求頭數量
 71                     request.getMimeHeaders().setLimit(endpoint.getMaxHeaderCount());
 72                     // 設置做多可設置cookie數量
 73                     request.getCookies().setLimit(getMaxCookieCount());
 74                     // Currently only NIO will ever return false here
 75                     // Don't parse headers for HTTP/0.9
 76                     /**
 77                      * 讀取請求頭
 78                      */
 79                     if (!http09 && !getInputBuffer().parseHeaders()) {
 80                         // We've read part of the request, don't recycle it
 81                         // instead associate it with the socket
 82                         openSocket = true;
 83                         readComplete = false;
 84                         break;
 85                     }
 86                     if (!disableUploadTimeout) {
 87                         setSocketTimeout(connectionUploadTimeout);
 88                     }
 89                 }
 90             } catch (IOException e) {
 91                 if (getLog().isDebugEnabled()) {
 92                     getLog().debug(
 93                             sm.getString("http11processor.header.parse"), e);
 94                 }
 95                 setErrorState(ErrorState.CLOSE_NOW, e);
 96                 break;
 97             } catch (Throwable t) {
 98                 ExceptionUtils.handleThrowable(t);
 99                 UserDataHelper.Mode logMode = userDataHelper.getNextMode();
100                 if (logMode != null) {
101                     String message = sm.getString(
102                             "http11processor.header.parse");
103                     switch (logMode) {
104                         case INFO_THEN_DEBUG:
105                             message += sm.getString(
106                                     "http11processor.fallToDebug");
107                             //$FALL-THROUGH$
108                         case INFO:
109                             getLog().info(message, t);
110                             break;
111                         case DEBUG:
112                             getLog().debug(message, t);
113                     }
114                 }
115                 // 400 - Bad Request
116                 response.setStatus(400);
117                 setErrorState(ErrorState.CLOSE_CLEAN, t);
118                 getAdapter().log(request, response, 0);
119             }
120 
121             if (!getErrorState().isError()) {
122                 // Setting up filters, and parse some request headers
123                 rp.setStage(org.apache.coyote.Constants.STAGE_PREPARE);
124                 try {
125                     prepareRequest();
126                 } catch (Throwable t) {
127                     ExceptionUtils.handleThrowable(t);
128                     if (getLog().isDebugEnabled()) {
129                         getLog().debug(sm.getString(
130                                 "http11processor.request.prepare"), t);
131                     }
132                     // 500 - Internal Server Error
133                     response.setStatus(500);
134                     setErrorState(ErrorState.CLOSE_CLEAN, t);
135                     getAdapter().log(request, response, 0);
136                 }
137             }
138 
139             if (maxKeepAliveRequests == 1) {
140                 keepAlive = false;
141             } else if (maxKeepAliveRequests > 0 &&
142                     socketWrapper.decrementKeepAlive() <= 0) {
143                 keepAlive = false;
144             }
145 
146             // Process the request in the adapter
147             if (!getErrorState().isError()) {
148                 try {
149                     rp.setStage(org.apache.coyote.Constants.STAGE_SERVICE);
150                     /**
151                      * 將封裝好的請求和響應對象,交由容器處理
152                      * service-->host-->context-->wrapper-->servlet
153                      * 這里非常重要，我們所寫的servlet代碼正是這里在調用，它遵循了Servlet規范
154                      * 這里處理完，代表程序員開發的servlet已經執行完畢
155                      */
156                     adapter.service(request, response);
157                     // Handle when the response was committed before a serious
158                     // error occurred.  Throwing a ServletException should both
159                     // set the status to 500 and set the errorException.
160                     // If we fail here, then the response is likely already
161                     // committed, so we can't try and set headers.
162                     if(keepAlive && !getErrorState().isError() && (
163                             response.getErrorException() != null ||
164                                     (!isAsync() &&
165                                     statusDropsConnection(response.getStatus())))) {
166                         setErrorState(ErrorState.CLOSE_CLEAN, null);
167                     }
168                     setCometTimeouts(socketWrapper);
169                 } catch (InterruptedIOException e) {
170                     setErrorState(ErrorState.CLOSE_NOW, e);
171                 } catch (HeadersTooLargeException e) {
172                     getLog().error(sm.getString("http11processor.request.process"), e);
173                     // The response should not have been committed but check it
174                     // anyway to be safe
175                     if (response.isCommitted()) {
176                         setErrorState(ErrorState.CLOSE_NOW, e);
177                     } else {
178                         response.reset();
179                         response.setStatus(500);
180                         setErrorState(ErrorState.CLOSE_CLEAN, e);
181                         response.setHeader("Connection", "close"); // TODO: Remove
182                     }
183                 } catch (Throwable t) {
184                     ExceptionUtils.handleThrowable(t);
185                     getLog().error(sm.getString("http11processor.request.process"), t);
186                     // 500 - Internal Server Error
187                     response.setStatus(500);
188                     setErrorState(ErrorState.CLOSE_CLEAN, t);
189                     getAdapter().log(request, response, 0);
190                 }
191             }
192 
193             // Finish the handling of the request
194             rp.setStage(org.apache.coyote.Constants.STAGE_ENDINPUT);
195 
196             if (!isAsync() && !comet) {
197                 if (getErrorState().isError()) {
198                     // If we know we are closing the connection, don't drain
199                     // input. This way uploading a 100GB file doesn't tie up the
200                     // thread if the servlet has rejected it.
201                     getInputBuffer().setSwallowInput(false);
202                 } else {
203                     // Need to check this again here in case the response was
204                     // committed before the error that requires the connection
205                     // to be closed occurred.
206                     checkExpectationAndResponseStatus();
207                 }
208                 /**
209                  * 請求收尾工作
210                  * 判斷請求體是否讀取完畢，沒有則讀取完畢，並修正pos
211                  * 請求體讀取分為兩種：
212                  * 1、程序員讀取：在servlet中有程序員主動讀取，這種方式讀取數據不一定讀取完整數據，取決於業務需求
213                  * 2、Tomcat自己讀取：如果servlet中沒有讀取，或者沒有讀取完全，則Tomcat負責讀取剩余的請求體
214                  * 1和2的差別在於，2中僅僅把數據從操作系統讀取到buf中，盡管也用了字節塊做標記，但是不會做其他的事情，而1中還會把字節塊標記的數據拷貝到目標數組中
215                  * 這個方法就是處理情況2中的請求體讀取邏輯
216                  */
217                 endRequest();
218             }
219 
220             rp.setStage(org.apache.coyote.Constants.STAGE_ENDOUTPUT);
221 
222             // If there was an error, make sure the request is counted as
223             // and error, and update the statistics counter
224             if (getErrorState().isError()) {
225                 response.setStatus(500);
226             }
227             request.updateCounters();
228 
229             if (!isAsync() && !comet || getErrorState().isError()) {
230                 if (getErrorState().isIoAllowed()) {
231                     /**
232                      * 根據修正完的pos和lastValid,初始化數組下標，以便繼續處理下一次請求
233                      * 兩種情況
234                      * 1、讀取請求體剛好讀取完，將pos=lastValid=0，即都指向buf數組第一個位置，重新讀取數據
235                      * 2、讀取請求體多讀出了下次請求的數據，這個時候需要將下個請求的數據移動到buf數組頭，以便處理下個請求
236                      * 注意，buf數組中的數據沒有刪除，是直接覆蓋，從而達到對buf數組的重復使用
237                      */
238                     getInputBuffer().nextRequest();
239                     getOutputBuffer().nextRequest();
240                 }
241             }
242 
243             if (!disableUploadTimeout) {
244                 if(endpoint.getSoTimeout() > 0) {
245                     setSocketTimeout(endpoint.getSoTimeout());
246                 } else {
247                     setSocketTimeout(0);
248                 }
249             }
250 
251             rp.setStage(org.apache.coyote.Constants.STAGE_KEEPALIVE);
252 
253             if (breakKeepAliveLoop(socketWrapper)) {
254                 break;
255             }
256         }
257 
258         rp.setStage(org.apache.coyote.Constants.STAGE_ENDED);
259 
260         if (getErrorState().isError() || endpoint.isPaused()) {
261             return SocketState.CLOSED;
262         } else if (isAsync() || comet) {
263             return SocketState.LONG;
264         } else if (isUpgrade()) {
265             return SocketState.UPGRADING;
266         } else if (getUpgradeInbound() != null) {
267             return SocketState.UPGRADING_TOMCAT;
268         } else {
269             if (sendfileInProgress) {
270                 return SocketState.SENDFILE;
271             } else {
272                 if (openSocket) {
273                     if (readComplete) {
274                         return SocketState.OPEN;
275                     } else {
276                         return SocketState.LONG;
277                     }
278                 } else {
279                     return SocketState.CLOSED;
280                 }
281             }
282         }
283     }

View Code

分析：

上述方法展示整個請求處理的核心過程，其中52行開始處理請求行：getInputBuffer().parseRequestLine(keptAlive)

二、請求行解析

具體方法如下：

  1 /**
  2      * Read the request line. This function is meant to be used during the
  3      * HTTP request header parsing. Do NOT attempt to read the request body
  4      * using it.
  5      *
  6      * @throws IOException If an exception occurs during the underlying socket
  7      * read operations, or if the given buffer is not big enough to accommodate
  8      * the whole line.
  9      */
 10     /**
 11      * 讀取請求行方法
 12      * 請求行格式如下：
 13      * ========================================
 14      * 請求方法 空格 URL 空格 協議版本 回車換行
 15      * ========================================
 16      * @param useAvailableDataOnly
 17      * @return
 18      * @throws IOException
 19      */
 20     @Override
 21     public boolean parseRequestLine(boolean useAvailableDataOnly)
 22 
 23         throws IOException {
 24 
 25         int start = 0;
 26 
 27         //
 28         // Skipping blank lines
 29         //
 30 
 31         /**
 32          * 過濾掉回車(CR)換行(LF)符，確定start位置
 33          */
 34         do {
 35 
 36             // Read new bytes if needed
 37             if (pos >= lastValid) {
 38                 if (!fill())
 39                     throw new EOFException(sm.getString("iib.eof.error"));
 40             }
 41             // Set the start time once we start reading data (even if it is
 42             // just skipping blank lines)
 43             if (request.getStartTime() < 0) {
 44                 request.setStartTime(System.currentTimeMillis());
 45             }
 46             /**
 47              * chr記錄第一個非CRLF字節，后面讀取請求頭的時候用到
 48              */
 49             chr = buf[pos++];
 50         } while (chr == Constants.CR || chr == Constants.LF);
 51 
 52         pos--;
 53 
 54         // Mark the current buffer position
 55         start = pos;
 56 
 57         //
 58         // Reading the method name
 59         // Method name is a token
 60         //
 61 
 62         boolean space = false;
 63 
 64         /**
 65          * 讀取HTT請求方法：get/post/put....
 66          */
 67         while (!space) {
 68 
 69             // Read new bytes if needed
 70             if (pos >= lastValid) {
 71                 if (!fill())
 72                     throw new EOFException(sm.getString("iib.eof.error"));
 73             }
 74 
 75             // Spec says method name is a token followed by a single SP but
 76             // also be tolerant of multiple SP and/or HT.
 77             if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
 78                 space = true;
 79                 /**
 80                  * 設置HTTP請求方法，這里沒有直接設置字符串，而是用了字節塊ByteChunk
 81                  * ByteChunk中包含一個字節數據類型的屬性buff，此處的setBytes方法就是將buff指向Tomcat的緩存buf。然后start和end標記為
 82                  * 此處方法的后兩個入參，也就是將請求方法在buf中標記了出來，但是沒有轉換成字符串，等到使用的時候再使用ByteBuffer.wap方法
 83                  * 轉換成字符串，且標記hasStrValue=true，如果再次獲取就直接拿轉換好的字符串，不用再次轉換。效率考慮？牛逼！
 84                  * 因此，就算后面由於請求體過長，Tomcat重新開辟新的數組buf讀取請求體。原buf也不會被GC，因為ByteChunk中的buff引用了原buf數組
 85                  * 什么時候原數組才會被GC？本次請求結束，request對象被GC后。。。
 86                  */
 87                 request.method().setBytes(buf, start, pos - start);
 88             } else if (!HttpParser.isToken(buf[pos])) {
 89                 String invalidMethodValue = parseInvalid(start, buf);
 90                 throw new IllegalArgumentException(sm.getString("iib.invalidmethod", invalidMethodValue));
 91             }
 92 
 93             pos++;
 94 
 95         }
 96 
 97         // Spec says single SP but also be tolerant of multiple SP and/or HT
 98         /**
 99          * 過濾請求方法后面的空格(SP或者HT)
100          */
101         while (space) {
102             // Read new bytes if needed
103             if (pos >= lastValid) {
104                 if (!fill())
105                     throw new EOFException(sm.getString("iib.eof.error"));
106             }
107             if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
108                 pos++;
109             } else {
110                 space = false;
111             }
112         }
113 
114         // Mark the current buffer position
115         start = pos;
116         int end = 0;
117         int questionPos = -1;
118 
119         //
120         // Reading the URI
121         //
122 
123         boolean eol = false;
124 
125         /**
126          * 讀取URL
127          */
128         while (!space) {
129 
130             // Read new bytes if needed
131             if (pos >= lastValid) {
132                 if (!fill())
133                     throw new EOFException(sm.getString("iib.eof.error"));
134             }
135 
136             /**
137              * CR后面沒有LF，不是HTTP0.9，拋異常
138              */
139             if (buf[pos -1] == Constants.CR && buf[pos] != Constants.LF) {
140                 // CR not followed by LF so not an HTTP/0.9 request and
141                 // therefore invalid. Trigger error handling.
142                 // Avoid unknown protocol triggering an additional error
143                 request.protocol().setString(Constants.HTTP_11);
144                 String invalidRequestTarget = parseInvalid(start, buf);
145                 throw new IllegalArgumentException(sm.getString("iib.invalidRequestTarget", invalidRequestTarget));
146             }
147 
148             // Spec says single SP but it also says be tolerant of HT
149             if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
150                 /**
151                  * 遇到空格(SP或者HT)，URL讀取結束
152                  */
153                 space = true;
154                 end = pos;
155             } else if (buf[pos] == Constants.CR) {
156                 // HTTP/0.9 style request. CR is optional. LF is not.
157             } else if (buf[pos] == Constants.LF) {
158                 // HTTP/0.9 style request
159                 // Stop this processing loop
160                 space = true;
161                 // Set blank protocol (indicates HTTP/0.9)
162                 request.protocol().setString("");
163                 // Skip the protocol processing
164                 eol = true;
165                 if (buf[pos - 1] == Constants.CR) {
166                     end = pos - 1;
167                 } else {
168                     end = pos;
169                 }
170             } else if ((buf[pos] == Constants.QUESTION) && (questionPos == -1)) {
171                 questionPos = pos;
172             } else if (questionPos != -1 && !httpParser.isQueryRelaxed(buf[pos])) {
173                 // %nn decoding will be checked at the point of decoding
174                 String invalidRequestTarget = parseInvalid(start, buf);
175                 throw new IllegalArgumentException(sm.getString("iib.invalidRequestTarget", invalidRequestTarget));
176             } else if (httpParser.isNotRequestTargetRelaxed(buf[pos])) {
177                 // This is a general check that aims to catch problems early
178                 // Detailed checking of each part of the request target will
179                 // happen in AbstractHttp11Processor#prepareRequest()
180                 String invalidRequestTarget = parseInvalid(start, buf);
181                 throw new IllegalArgumentException(sm.getString("iib.invalidRequestTarget", invalidRequestTarget));
182             }
183             pos++;
184         }
185         /**
186          * 讀取HTTP URL
187          */
188         request.unparsedURI().setBytes(buf, start, end - start);
189         if (questionPos >= 0) {
190             /**
191              * 當有請求入參的時候
192              * 讀取入參字符串
193              * 讀取URI
194              */
195             request.queryString().setBytes(buf, questionPos + 1,
196                                            end - questionPos - 1);
197             request.requestURI().setBytes(buf, start, questionPos - start);
198         } else {
199             /**
200              * 沒有請求入參的時候，直接讀取URI
201              */
202             request.requestURI().setBytes(buf, start, end - start);
203         }
204 
205         // Spec says single SP but also says be tolerant of multiple SP and/or HT
206         while (space && !eol) {
207             // Read new bytes if needed
208             if (pos >= lastValid) {
209                 if (!fill())
210                     throw new EOFException(sm.getString("iib.eof.error"));
211             }
212             if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
213                 pos++;
214             } else {
215                 space = false;
216             }
217         }
218 
219         // Mark the current buffer position
220         start = pos;
221         end = 0;
222 
223         //
224         // Reading the protocol
225         // Protocol is always "HTTP/" DIGIT "." DIGIT
226         //
227         /**
228          * 讀取HTTP協議版本
229          */
230         while (!eol) {
231 
232             // Read new bytes if needed
233             if (pos >= lastValid) {
234                 if (!fill())
235                     throw new EOFException(sm.getString("iib.eof.error"));
236             }
237 
238             if (buf[pos] == Constants.CR) {
239                 // Possible end of request line. Need LF next.
240             } else if (buf[pos - 1] == Constants.CR && buf[pos] == Constants.LF) {
241                 end = pos - 1;
242                 eol = true;
243             } else if (!HttpParser.isHttpProtocol(buf[pos])) {
244                 String invalidProtocol = parseInvalid(start, buf);
245                 throw new IllegalArgumentException(sm.getString("iib.invalidHttpProtocol", invalidProtocol));
246             }
247 
248             pos++;
249 
250         }
251 
252         /**
253          * 字節塊標記協議版本
254          */
255         if ((end - start) > 0) {
256             request.protocol().setBytes(buf, start, end - start);
257         }
258 
259         /**
260          * 如果沒有協議版本，無法處理請求，拋異常
261          */
262         if (request.protocol().isNull()) {
263             throw new IllegalArgumentException(sm.getString("iib.invalidHttpProtocol"));
264         }
265 
266         return true;
267     }

View Code

在這個方法中，其實就是請求行請求方法、url、協議版本這幾個部分的讀取。

分析：

34-50行：這個while循環是過濾行首的回車換行符，只要是回車換行符下標pos就往后移動一位，直到不是回車換行符，跳出循環。由於這里是先執行pos++，所以如果不滿足條件，pos需要后移一位，也就是真正開始讀取請求方法的位置，標記為start。

37-40行：這里是非常關鍵的幾行代碼，幾乎貫穿整個請求處理部分。Tomcat接收請求，就是在接收客戶端的請求數據，數據經過網絡傳輸到Tomcat所在的服務操作系統緩沖區，Tomcat從操作系統讀取到自己的緩沖區buf中。這幾行代碼主要就是干這個事情的。前面我們介紹了字節數在buf是通過pos和lastValid控制讀取的。37行判斷當pos>=lastValid，表示buf數組中讀取自操作系統的數據已經解析完畢，調用fill()方法再次從操作系統讀取。代碼如下：

 1 @Override
 2     protected boolean fill(boolean block) throws IOException {
 3 
 4         int nRead = 0;
 5 
 6         /**
 7          * 這個核心就是讀取socket中數據到緩沖區buf中，循環讀取，2種情況
 8          * 1、請求行和請求頭：不能超過緩沖區大小(默認8kb)，如果超過，則拋異常，讀完后將parsingHeader設置為false
 9          * 2、請求行：沒有任何大小限制，循環讀取，如果剩下的少於4500個字節，則會重新創建buf數組，從頭開始讀取，直到讀完位置，注意！buf原先引用的數組們，等待GC
10          */
11         if (parsingHeader) {
12 
13             /**
14              * 從socket中讀取數據大於tomcat中緩沖區buf的長度，直接拋異常,這里有兩點
15              * 1、這個就是我們很多時候很多人說的，get請求url不能過長的原因，其實是header和url等總大小不能超過8kb
16              * 2、這里的buf非常總要，它是InternalInputBuffer的屬性，是一個字節數據，用戶暫存從socket中讀取的數據，比如：請求行，請求頭、請求體
17              */
18             if (lastValid == buf.length) {
19                 throw new IllegalArgumentException
20                     (sm.getString("iib.requestheadertoolarge.error"));
21             }
22 
23             // 將socket中的數據讀到緩沖區buf中，注意！這里就是BIO之所以難懂的關鍵所在，它會阻塞！
24             // 這個方法會阻塞，如果沒有數據可讀，則會一直阻塞，有數據，則移動lastValid位置
25             nRead = inputStream.read(buf, pos, buf.length - lastValid);
26             if (nRead > 0) {
27                 lastValid = pos + nRead;
28             }
29 
30         } else {
31             /**
32              * parsingHeader==false，請求行和請求頭已經讀取完畢，開始讀取請求體
33              */
34 
35             if (buf.length - end < 4500) {
36                 // In this case, the request header was really large, so we allocate a
37                 // brand new one; the old one will get GCed when subsequent requests
38                 // clear all references
39                 /**
40                  * 如果Tomcat緩存區buf讀取完請求行和請求頭后，剩余長度不足4500(可配置)，新創建一個字節數組buf用於讀取請求體
41                  * 為什么要這么做，應該是考慮到如果剩余的數據長度較小，每次從操作系統緩存區讀取的字節就比較少，讀取次數就比較多？
42                  * 注意，buf原先指向的字節數據會白GC么？應該不會，因為請求行和請求頭有許多字節塊(ByteChunk)指向了舊字節數據。
43                  * 什么時候才會被GC？應該是一起request處理完畢后。
44                  */
45                 buf = new byte[buf.length];
46                 end = 0;
47             }
48             /**
49              * 這里的end是請求頭數據的后一位，從這里開始讀取請求體數據。
50              * 從操作系統讀取數據到buf中，下標pos開始，lastValid結束
51              * 注意：這里每次讀取請求體數據的時候都會把pos重置為end(請求頭數據的后一位)!!!!!
52              * 表示什么？
53              * 請求體數據每一次從操作系統緩存中讀取到buf，然后讀取到程序員自己的數組后，在下次再次從操作系統讀取數據到buf時，就會把之前讀取的請求體數據覆蓋掉
54              * 也就是從end位置開始，后面的數據都只能讀取一次，這個很重要！！！
55              * 為什么這么做？我的理解是因為請求體數據可以很大，為了單個請求不占用太大內存，所以設計成了覆蓋的模式，真是秒啊！
56              */
57             pos = end;
58             lastValid = pos;
59 
60             /**
61              * 原則上這個方法要么阻塞着，要么nRead>0
62              */
63             nRead = inputStream.read(buf, pos, buf.length - lastValid);
64             if (nRead > 0) {
65                 lastValid = pos + nRead;
66             }
67 
68         }
69 
70         /**
71          * 注意，這里不出意外，只能返回true
72           */
73         return (nRead > 0);
74 
75     }

View Code

這個方法由兩部分邏輯組成：parsingHeader=true或者false，這個變量表示讀取的請求行和請求頭，還是讀取的請求體。變量名有點歧義，並不是只包含請求頭，而是請求行和請求頭。

11-30行：讀取請求行和請求頭數據，邏輯很簡單：從操作系統讀取數據到字節數組buf中，后移lastValid下標到buf數組最后一個字節的位置。在Tomcat解析完這部分數據后，會把parsingHeader置為false，且用end下標指向請求頭后一個字節，以便后續可以讀取請求體數據。

35-66行：讀取請求體數據，邏輯比請求行和請求頭讀取稍微復雜點：判斷buf數組剩余字節長度是否大於4500，反之重新創建數組。每次讀取pos和lastValid都置為end，然后讀取數據到buf數組中，lastValid后移。由於請求體數據可能比較大，且理論上沒有上限限制，為了減少讀取次數，buf剩余空間不能過小。每次讀取數據到buf中，都是存放在end位置開始，每次都是覆蓋上一次讀取的數據，所以我們可以大膽猜測，請求體數據只能讀取一次，程序員自己如果需要多次使用，必須自行保存。我想這是為了減少內存使用吧，你們看呢？

還有一個關鍵點：25行和63行代碼：nRead = inputStream.read(buf, pos, buf.length - lastValid)，這行代碼是從操作系統讀取字節，接觸過socket編程的都知道read方法這里可能會阻塞的，當操作系統緩存中當前沒有數據可讀，等待網絡傳輸的時候，read方法阻塞，直到有數據返回后再繼續。

回到讀取請求行的代碼。確定好了start位置后，開始讀取請求方法。

67-95行：又是一個while循環，當遇到SP或者HT時，表示請求方法已經讀取完畢。

87行：將start到pos前一位用字節塊進行標記，只是標記，並不會轉換成字符串。具體代碼：

 1 /**
 2      * Sets the buffer to the specified subarray of bytes.
 3      *
 4      * @param b the ascii bytes
 5      * @param off the start offset of the bytes
 6      * @param len the length of the bytes
 7      */
 8     public void setBytes(byte[] b, int off, int len) {
 9         buff = b;
10         start = off;
11         end = start + len;
12         isSet = true;
13         hasHashCode = false;
14     }

View Code

101-112行：繼續過濾掉SP或者HT，重置start，為讀取URL做准備。

128-184行：讀取所有的URL字節，遇到空格退出，這里並沒有標記URL.

188-203行：根據上面得出的位置標記，利用字節塊對URI、URL、參數分別進行標記。

206-217行：再次對空格進行過濾，重置start，准備讀取協議版本。

230-250行：讀取剩余字節，遇到連續的兩個字節CRLF，確定請求行結束位置。

256行：使用字節塊標記協議版本。

自此，請求行解析完畢，每個部分都已經利用專門的字節塊(ByteChunk)進行標記。我們看到每個循環里面都會調用fill()方法從操作系統讀取數據到Tomcat緩沖區中，一次請求數據的傳輸不一定能夠傳輸完畢，所以Tomcat中要始終保持讀取數據的狀態，這個是關鍵，一定要理解，否則就無法理解Tomcat對請求數據的解析過程。

三、請求頭解析

再次回到處理請求的入口代碼中：其中79行開始處理請求頭：getInputBuffer().parseHeaders()。

 1 /**
 2      * Parse the HTTP headers.
 3      */
 4     @Override
 5     public boolean parseHeaders()
 6         throws IOException {
 7         /**
 8          * 請求行和請求頭讀取的標志，如果不是請求行和請求頭，進入此方法，拋異常
 9          */
10         if (!parsingHeader) {
11             throw new IllegalStateException(
12                     sm.getString("iib.parseheaders.ise.error"));
13         }
14 
15         /**
16          * 讀取請求頭，循環執行，每次循環讀取請求頭的一個key:value對
17          */
18         while (parseHeader()) {
19             // Loop until we run out of headers
20         }
21 
22         /**
23          * 請求頭讀取完畢，標志變為false，end=pos,標志此處是請求行和請求頭讀取完畢的位置
24          */
25         parsingHeader = false;
26         end = pos;
27         return true;
28     }

View Code

整個parseHearders方法比較簡單，分三部分邏輯：1、判斷是否parsingHeader=true，不是的話拋異常。2、while循環。3、處理完畢parsingHeader=false，end=pos，為讀取請求體做准備。

重點關注第二部分的循環，18行代碼：while (parseHeader())。僅僅是一個循環，沒有方法體。這里其實每次循環都是試圖讀取一個請求頭的key:value對。代碼如下：

  1 /**
  2      * 讀取請求頭信息，注意：每次調用該方法，完成一個鍵值對讀取，也即下面格式中的一行請求頭
  3      * 請求頭格式如下
  4      * ===================================
  5      * key:空格(SP)value回車(CR)換行(LF)
  6      * ...
  7      * key:空格(SP)value回車(CR)換行(LF)
  8      * 回車(CR)換行(LF)
  9      * ===================================
 10      *
 11      * Parse an HTTP header.
 12      *
 13      * @return false after reading a blank line (which indicates that the
 14      * HTTP header parsing is done
 15      */
 16     @SuppressWarnings("null") // headerValue cannot be null
 17     private boolean parseHeader() throws IOException {
 18 
 19         /**
 20          * 此循環主要是在每行請求頭信息開始前，確定首字節的位置
 21          */
 22         while (true) {
 23 
 24             // Read new bytes if needed
 25             /**
 26              * Tomcat緩存buf中沒有帶讀取數據，重新從操作系統讀取一批
 27              */
 28             if (pos >= lastValid) {
 29                 if (!fill())
 30                     throw new EOFException(sm.getString("iib.eof.error"));
 31             }
 32 
 33             /**
 34              * 這里的chr最開始是在讀取請求行時賦值，賦予它請求行第一個非空格字節
 35              */
 36             prevChr = chr;
 37             chr = buf[pos];
 38 
 39             /**
 40              * 首位置是回車符(CR)，有2種情況：
 41              * 1、CR+(~LF) 首次先往后移動一個位置，試探第二個位置是否是LF，如果是則進入情況2；如果不是,則回退pos。key首字節可以是CR，但第2個字節不能是LF，因為行CRLF是請求頭結束標志
 42              * 2、CR+LF 請求頭結束標志，直接結束請求頭讀取
 43              * 首位置不是CR，直接結束循環，開始讀取key
 44              */
 45             if (chr == Constants.CR && prevChr != Constants.CR) {
 46                 /**
 47                  * 每次while循環首次進入這個if分支preChr都不是CR，如果當前位置pos是CR，則往后移動一位，根據后一位情況決定后續操作
 48                  * 如果后一位是LF，直接直接請求頭讀取
 49                  * 如果后一位不是LF,pos回退一位，用作key。
 50                  */
 51                 // Possible start of CRLF - process the next byte.
 52             } else if (prevChr == Constants.CR && chr == Constants.LF) {
 53                 /**
 54                  * 請求頭結束,注意是請求頭結束，不是當前鍵值對結束，請求頭結束標志：沒有任何其他數據，直接CRLF
 55                  */
 56                 pos++;
 57                 return false;
 58             } else {
 59                 /**
 60                  * 如果當前行的首字節不是CR，直接break，開始讀取key
 61                  * 如果當前行首字節是CR，但是第二字節不是LF，pos回退1位，開始讀取key
 62                  */
 63                 if (prevChr == Constants.CR) {
 64                     // Must have read two bytes (first was CR, second was not LF)
 65                     pos--;
 66                 }
 67                 break;
 68             }
 69 
 70             pos++;
 71         }
 72 
 73         // Mark the current buffer position
 74         /**
 75          * 標記當前鍵值對行開始位置
 76          */
 77         int start = pos;
 78         int lineStart = start;
 79 
 80         //
 81         // Reading the header name
 82         // Header name is always US-ASCII
 83         //
 84 
 85         /**
 86          * colon標記冒號的位置
 87          */
 88         boolean colon = false;
 89         MessageBytes headerValue = null;
 90 
 91         /**
 92          * 讀取key，直到當前字節是冒號(:)跳出循環，pos指向冒號后一個字節
 93          */
 94         while (!colon) {
 95 
 96             // Read new bytes if needed
 97             /**
 98              * 獲取緩沖區數據
 99              */
100             if (pos >= lastValid) {
101                 if (!fill())
102                     throw new EOFException(sm.getString("iib.eof.error"));
103             }
104 
105 
106             if (buf[pos] == Constants.COLON) {
107                 /**
108                  * 當前字節是冒號，colon=true,當前循環執行完后，結束循環
109                  * 在Tomcat緩沖區buf字節數組中標記出頭信息的名稱key：
110                  * 每個key:value對中有2個MessageBytes對象，每個MessageBytes對象中都有字節塊ByteChunk，用來標記buf中的字節段
111                  */
112                 colon = true;
113                 headerValue = headers.addValue(buf, start, pos - start);
114             } else if (!HttpParser.isToken(buf[pos])) {
115                 // Non-token characters are illegal in header names
116                 // Parsing continues so the error can be reported in context
117                 // skipLine() will handle the error
118                 /**
119                  * 非普通字符，比如：(,?,:等，跳過這行
120                  */
121                 skipLine(lineStart, start);
122                 return true;
123             }
124 
125             /**
126              * 大寫字符轉換成小寫字符，chr記錄key中最后一個有效字節
127              */
128             chr = buf[pos];
129             if ((chr >= Constants.A) && (chr <= Constants.Z)) {
130                 buf[pos] = (byte) (chr - Constants.LC_OFFSET);
131             }
132 
133             /**
134              * 下標自增，繼續下次循環
135              */
136             pos++;
137 
138         }
139 
140         // Mark the current buffer positio
141         /**
142          * 重置start，開始讀取請求頭值value
143          */
144         start = pos;
145         int realPos = pos;
146 
147         //
148         // Reading the header value (which can be spanned over multiple lines)
149         //
150 
151         boolean eol = false;
152         boolean validLine = true;
153 
154         while (validLine) {
155 
156             boolean space = true;
157 
158             // Skipping spaces
159             /**
160              * 跳過空格(SP)和制表符(HT)
161              */
162             while (space) {
163 
164                 // Read new bytes if needed
165                 if (pos >= lastValid) {
166                     if (!fill())
167                         throw new EOFException(sm.getString("iib.eof.error"));
168                 }
169 
170                 if ((buf[pos] == Constants.SP) || (buf[pos] == Constants.HT)) {
171                     pos++;
172                 } else {
173                     space = false;
174                 }
175 
176             }
177 
178             int lastSignificantChar = realPos;
179 
180             // Reading bytes until the end of the line
181             /**
182              *
183              */
184             while (!eol) {
185 
186                 // Read new bytes if needed
187                 if (pos >= lastValid) {
188                     if (!fill())
189                         throw new EOFException(sm.getString("iib.eof.error"));
190                 }
191 
192                 /**
193                  * prevChr首次為chr=:,之后為上一次循環的chr
194                  * chr為當前pos位置的字節
195                  */
196                 prevChr = chr;
197                 chr = buf[pos];
198                 if (chr == Constants.CR) {
199                     /**
200                      * 當前字節是回車符，直接下次循環，看下個字節是否是LF
201                      */
202                     // Possible start of CRLF - process the next byte.
203                 } else if (prevChr == Constants.CR && chr == Constants.LF) {
204                     /**
205                      * 當前字節是LF,前一個字節是CR，請求頭當前key:value行讀取結束
206                      */
207                     eol = true;
208                 } else if (prevChr == Constants.CR) {
209                     /**
210                      * 如果前一字節是CR，當前位置字節不是LF，則本key:value對無效，刪除！
211                      * 直接返回true，讀取下一個key:value對
212                      */
213                     // Invalid value
214                     // Delete the header (it will be the most recent one)
215                     headers.removeHeader(headers.size() - 1);
216                     skipLine(lineStart, start);
217                     return true;
218                 } else if (chr != Constants.HT && HttpParser.isControl(chr)) {
219                     // Invalid value
220                     // Delete the header (it will be the most recent one)
221                     headers.removeHeader(headers.size() - 1);
222                     skipLine(lineStart, start);
223                     return true;
224                 } else if (chr == Constants.SP) {
225                     /**
226                      * 當前位置空格，位置后移一位
227                      */
228                     buf[realPos] = chr;
229                     realPos++;
230                 } else {
231                     /**
232                      * 當前位置常規字符，位置后移一位，標記最后字符
233                      */
234                     buf[realPos] = chr;
235                     realPos++;
236                     lastSignificantChar = realPos;
237                 }
238 
239                 pos++;
240 
241             }
242 
243             realPos = lastSignificantChar;
244 
245             // Checking the first character of the new line. If the character
246             // is a LWS, then it's a multiline header
247 
248             // Read new bytes if needed
249             if (pos >= lastValid) {
250                 if (!fill())
251                     throw new EOFException(sm.getString("iib.eof.error"));
252             }
253 
254             /**
255              * 特殊邏輯：
256              * 當前key:value對讀取完后，
257              * 如果緊接着的是SP(空格)或則HT(制表符),表示當前value讀取並未結束，是多行的，將eol=false，繼續讀取，直到CRLF.
258              * 如果緊接着不是SP和HT，那vaLine=false,跳出循環，value讀取完畢
259              */
260             byte peek = buf[pos];
261             if (peek != Constants.SP && peek != Constants.HT) {
262                 validLine = false;
263             } else {
264                 eol = false;
265                 // Copying one extra space in the buffer (since there must
266                 // be at least one space inserted between the lines)
267                 buf[realPos] = peek;
268                 realPos++;
269             }
270 
271         }
272 
273         // Set the header value
274         /**
275          * 使用新的字節塊BytChunk標記當前key:value對的value
276          */
277         headerValue.setBytes(buf, start, realPos - start);
278 
279         return true;
280 
281     }

View Code

22-71行：每行請求頭開始讀取前，確定首字節的位置。詳細邏輯比較復雜，請看代碼注釋

94-138行：讀取請求頭key的數據，直到遇到冒號為止，這里同樣使用了字節塊來標記。

144-結尾：首先重置start，然后再次過濾空格，直到遇到聯系的CRLF表示當前key:value結束。請求頭value數據同樣也使用了字節塊來做標記。

讀取value有個特殊邏輯：

260-271行：當前請求頭value讀取完畢后，如果緊接着是空格，表示當前請求頭的值有多個，將eol=false，繼續循環讀取，直到CRLF。這種情況一般很很少使用，了解就好。

至此，已經完整分析了一次HTTP請求中請求行和請求頭的詳細讀取過程。重點要理解Tomcat中的緩存緩沖區，以及IO讀取數據的方式。最后按照HTTP規范解析，這個過程比較底層，也比較繞，需要有耐心，下篇文章我們繼續開始請求體的處理，敬請關注！

再次強調：以上源碼都是基於Tomcat7，且是BIO模型。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HTTP請求解析 Tomcat請求頭過大 HTTP請求報文（請求行、請求頭、請求體） HTTP請求報文（請求行、請求頭、請求體） HTTP請求解析--從一個請求開始 js請求解析xml HTTP請求行、請求頭、請求體詳解 Python爬蟲請求頭解析 HTTP請求頭host解析 HTTP請求的header頭解析