HttpClient官方sample代碼的深入分析(連接池)

前言

之前一直使用apache的httpclient(4.5.x), 進行http的交互處理. 而httpclient實例則使用了http連接池, 而一旦涉及到連接池, 那會不會在使用上有些隱藏很深的坑. 事實上, 通過分析httpclient源碼, 發現它很優雅地解決了這個問題, 同時隱藏所有的連接池細節. 今天這邊在這邊做下筆記.

官方代碼片段

這是apache httpclient官網提供一段代碼片段:

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://targethost/homepage");
CloseableHttpResponse response1 = httpclient.execute(httpGet);
// 連接對象被response對象持有, 以保證內容通過response對象消費
// 確保在finally代碼塊添加ClosableHttpResponse#close的調用
// 值得注意的是, 如果連接沒有被完全消費干凈, 該連接將不能安全復用, 將會被關閉, 被連接池丟棄 
try {
    System.out.println(response1.getStatusLine());
    HttpEntity entity1 = response1.getEntity();
    // do something useful with the response body
    // and ensure it is fully consumed
    EntityUtils.consume(entity1);
} finally {
    response1.close();
}

簡單分析下代碼, 非常的簡練, 你絲毫看不到任何連接池操作的蛛絲馬跡, 它是怎么設計, 又是怎么做到的呢?

常規連接池的注意點

連接池的使用需要保證如下幾點, 尤其對自研的連接池.
1. Connection的get/release配對.
2. 保證一次http交互中請求/響應處理完整干凈(cleanup).
比如一次請求交互中, 因某種原因沒有消費掉響應內容, 導致該內容還處於socket的緩存中. 繼而使得同一個連接下的第二次交互其響應內容為第一次的響應結果, 后果十分可怕. 以前做c++開發的時候, 封裝編寫redis連接池的時候, 就遇到類似的問題, 印象非常的深刻.

連接封裝

httpclient引入了ConnectionHolder類, 構建了真實連接(HttpCilentConnection)和連接池(HttpClientConnectionManager)的橋梁, 同時維護了該連接的可重用(reusable)和租賃(leased)狀態.

class ConnectionHolder implements ConnectionReleaseTrigger, 
        Cancellable, Closeable {
    private final Log log;
    private final HttpClientConnectionManager manager;
    private final HttpClientConnection managedConn;
    private final AtomicBoolean released;  // 連接池租賃狀態
    private volatile boolean reusable;     // 連接是否可復用
}

該類最重要的一個方法為releaseConnection, 后續的執行流程多多少少會涉及到該方法.

private void releaseConnection(boolean reusable) {
    // *) 判斷租賃狀態, 若已歸還連接池, 則不再執行后續的代碼
    if(this.released.compareAndSet(false, true)) {
        HttpClientConnection var2 = this.managedConn;
        synchronized(this.managedConn) {
            // *) 根據可重用性分情況處理, 同時歸還到連接池中
            if(reusable) {
                this.manager.releaseConnection(this.managedConn, 
                        this.state, this.validDuration, this.tunit);
            } else {
                try {
                    // *) 關閉連接
                    this.managedConn.close();
                    this.log.debug("Connection discarded");
                } catch (IOException var9) {
                    if(this.log.isDebugEnabled()) {
                        this.log.debug(var9.getMessage(), var9);
                    }
                } finally {
                    this.manager.releaseConnection(this.managedConn, 
                        (Object)null, 0L, TimeUnit.MILLISECONDS);
                }
            }
        }
    }

}

而CloseableHttpResponse又持有ConnectionHolder對象, 它close方法, 本質上就是間接調用了ConnectionHolder的releaseConnection方法.

class HttpResponseProxy implements CloseableHttpResponse {

    public void close() throws IOException {
        if(this.connHolder != null) {
            this.connHolder.close();
        }
    }
}

class ConnectionHolder 
        implements ConnectionReleaseTrigger, Cancellable, Closeable {

    public void close() throws IOException {
        this.releaseConnection(false);
    }

}

由此可見, 官方sample的推薦做法, 在finally中保證ClosableHttpResponse#close的調用, 能夠確保連接池的get/release配對. 若是close前, 連接狀態依舊為租賃狀態(leased為false), 則該連接明確不被復用.

可重用性判斷

http的長連接復用, 其判定規則主要分兩類.
1. http協議支持+請求/響應header指定
2. 一次交互處理的完整性(響應內容消費干凈)
對於前者, httpclient引入了ConnectionReuseStrategy來處理, 默認的采用如下的約定:

HTTP/1.0通過在Header中添加Connection:Keep-Alive來表示支持長連接.
HTTP/1.1默認支持長連接, 除非在Header中顯式指定Connection:Close, 才被視為短連接模式.

在MainClientExec類中相關的代碼片段:

var27 = this.requestExecutor.execute(request, managedConn, context);
if(this.reuseStrategy.keepAlive(var27, context)) {
    long entity = this.keepAliveStrategy.getKeepAliveDuration(var27, context);
    if(this.log.isDebugEnabled()) {
        String s;
        if(entity > 0L) {
            s = "for " + entity + " " + TimeUnit.MILLISECONDS;
        } else {
            s = "indefinitely";
        }

        this.log.debug("Connection can be kept alive " + s);
    }

    var25.setValidFor(entity, TimeUnit.MILLISECONDS);
    var25.markReusable();
} else {
    var25.markNonReusable();
}

具體ReusableStrategy中, 其執行代碼如下:

public class DefaultClientConnectionReuseStrategy 
            extends DefaultConnectionReuseStrategy {
    public static final DefaultClientConnectionReuseStrategy INSTANCE 
            = new DefaultClientConnectionReuseStrategy();

    public DefaultClientConnectionReuseStrategy() {
    }

    public boolean keepAlive(HttpResponse response, HttpContext context) {
        HttpRequest request = (HttpRequest)context
              .getAttribute("http.request");
        if(request != null) {
            // *) 尋找Connection:Close
            Header[] connHeaders = request.getHeaders("Connection");
            if(connHeaders.length != 0) {
                BasicTokenIterator ti = new BasicTokenIterator(
                        new BasicHeaderIterator(connHeaders, (String)null)
                    );

                while(ti.hasNext()) {
                    String token = ti.nextToken();
                    if("Close".equalsIgnoreCase(token)) {
                        return false;
                    }
                }
            }
        }

        return super.keepAlive(response, context);
    }
}

 而在父類的keepAlive函數中, 其實現如下:

public class DefaultConnectionReuseStrategy 
        implements ConnectionReuseStrategy {

    public boolean keepAlive(HttpResponse response, HttpContext context) {
        // 省略一段代碼
        if(headerIterator1.hasNext()) {
            try {
                BasicTokenIterator px1 = new BasicTokenIterator(headerIterator1);
                boolean keepalive1 = false;

                while(px1.hasNext()) {
                    String token = px1.nextToken();
                    // *) 存在Close Tag, 則不可重用
                    if("Close".equalsIgnoreCase(token)) {
                        return false;
                    }
                    // *) 存在Keep-Alive Tag 則可重用
                    if("Keep-Alive".equalsIgnoreCase(token)) {
                        keepalive1 = true;
                    }
                }

                if(keepalive1) {
                    return true;
                }
            } catch (ParseException var11) {
                return false;
            }
        }
        // 高於HTTP/1.0版本的都復用連接  
        return !ver1.lessEquals(HttpVersion.HTTP_1_0);
    }

}

總結一下:

request首部中包含Connection:Close，不復用
response中Content-Length長度設置不正確，不復用
response首部包含Connection:Close，不復用
reponse首部包含Connection:Keep-Alive，復用
都沒命中的情況下，如果HTTP版本高於1.0則復用

而對於后者(一次交互處理的完整性), 這是怎么判定的呢? 其實很簡單, 就是response返回的InputStream(HttpEntity#getContent)明確調用close方法(沒有引發socket的close), 即認為消費完整.
讓我們來簡單分析一下EntityUtils.consume方法.

public final class EntityUtils {

    public static void consume(HttpEntity entity) throws IOException {
        if(entity != null) {
            if(entity.isStreaming()) {
                InputStream instream = entity.getContent();
                if(instream != null) {
                    instream.close();
                }
            }
        }
    }

}

讓我們在ConnectionHolder類的releaseConnection方法中添加斷點.

然后具體執行一個http請求, 我們會發現程序運行到該斷點時的, 線程調用堆棧如下:

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at org.apache.http.impl.execchain.ConnectionHolder.releaseConnection(ConnectionHolder.java:97)
      at org.apache.http.impl.execchain.ConnectionHolder.releaseConnection(ConnectionHolder.java:120)
      at org.apache.http.impl.execchain.ResponseEntityProxy.releaseConnection(ResponseEntityProxy.java:76)
      at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:145)
      at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
      at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172)
      at org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:97)
      at org.apache.http.util.EntityUtils.consume(EntityUtils.java:90)

你會發現inputstream#close的調用, 會引發連接的歸還, 而此時reusable狀態值為true(前提KeepaliveStrategy判斷該連接為可復用).
再額外添加一個Apache HttpClient中定義的ContentLengthInputStream類的close實現, 用於明確close會附帶消費完數據, 以此打消最后的疑惑.

public class ContentLengthInputStream extends InputStream {

    // *) 該close會把剩余的字節全部消費, 才設定自己為關閉狀態
    public void close() throws IOException {
        if(!this.closed) {
            try {
                if(this.pos < this.contentLength) {
                    byte[] buffer = new byte[2048];

                    while(true) {
                        if(this.read(buffer) >= 0) {
                            continue;
                        }
                    }
                }
            } finally {
                this.closed = true;
            }
        }

    }

}

總結

讓我們再回到最初的官方sample代碼.

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://targethost/homepage");
CloseableHttpResponse response1 = httpclient.execute(httpGet);
try {
    System.out.println(response1.getStatusLine());
    HttpEntity entity1 = response1.getEntity();

    // *) 引發releaseConnect()調用, reusable值取決於keepAliveStrategy判定, leased置為true
    EntityUtils.consume(entity1);
} finally {
    // *) 若連接leased為false, 則releaseConnect(false)調用, 明確不可復用, leased置為true
    // *) 若連接leased為true, 則do nothing
    response1.close();
}

c++會使用RAII模式, 即利用對象的構造/析構函數來自動實現資源申請和釋放, java這邊的話, 還是需要明確的一個finally中, 添加保證釋放的代碼, ^_^.
總的來說, 該段代碼, 堪稱完美. 對於官方推薦的代碼, 放心大膽的使用即可.

參考文章

Http持久連接與HttpClient連接池
 關於HttpClient重試策略的研究