HttpClient POST 的 UTF-8 編碼問題


http://www.360doc.com/content/09/0915/15/61497_6003890.shtml不 過在實際使用中, 還是發現按照最基本的方式調用 HttpClient 時, 並不支持 UTF-8 編碼, 在網絡上找過一些文章, 也不得要領, 於是查看了 commons-httpClient3.0.1 的一些代碼, 首先在 PostMethod 中找到了 generateRequestEntity() 方法:
    /**
     * Generates a request entity from the post parameters, if present.   Calls
     * {@link EntityEnclosingMethod#generateRequestBody()} if parameters have not been set.
     *
     * @since 3.0
     */
    protected RequestEntity generateRequestEntity() {
        if (!this.params.isEmpty()) {
            // Use a ByteArrayRequestEntity instead of a StringRequestEntity.
            // This is to avoid potential encoding issues.   Form url encoded strings
            // are ASCII by definition but the content type may not be.   Treating the content
            // as bytes allows us to keep the current charset without worrying about how
            // this charset will effect the encoding of the form url encoded string.
            String content = EncodingUtil.formUrlEncode(getParameters(), getRequestCharSet());
            ByteArrayRequestEntity entity = new ByteArrayRequestEntity(
                EncodingUtil.getAsciiBytes(content),
                FORM_URL_ENCODED_CONTENT_TYPE
            );
            return entity;
        } else {
            return super.generateRequestEntity();
        }
    }

原來使用 NameValuePair 加入的 HTTP 請求的參數最終都會轉化為 RequestEntity 提交到 HTTP 服務器, 接着在 PostMethod 的父類 EntityEnclosingMethod 中找到了如下的代碼:
    /**
     * Returns the request's charset.   The charset is parsed from the request entity's
     * content type, unless the content type header has been set manually.
     *
     * @see RequestEntity#getContentType()
     *
     * @since 3.0
     */
    public String getRequestCharSet() {
        if (getRequestHeader("Content-Type") == null) {
            // check the content type from request entity
            // We can't call getRequestEntity() since it will probably call
            // this method.
            if (this.requestEntity != null) {
                return getContentCharSet(
                    new Header("Content-Type", requestEntity.getContentType()));
            } else {
                return super.getRequestCharSet();
            }
        } else {
            return super.getRequestCharSet();
        }
    }


解決方案

從上面兩段代碼可以看出是 HttpClient 是如何依據 "Content-Type" 獲得請求的編碼(字符集), 而這個編碼又是如何應用到提交內容的編碼過程中去的. 按照這個原來, 其實我們只需要重載 getRequestCharSet() 方法, 返回我們需要的編碼(字符集)名稱, 就可以解決 UTF-8 或者其它非默認編碼提交 POST 請求時的亂碼問題了.

測試

首先在 Tomcat 的 ROOT WebApp 下部署一個頁面 test.jsp, 作為測試頁面, 主要代碼片段如下:
<%@ page contentType="text/html;charset=UTF-8"%>
<%@ page session="false" %>
<%
request.setCharacterEncoding("UTF-8");
String val = request.getParameter("TEXT");
System.out.println(">>>> The result is " + val);
%>


接着寫一個測試類, 主要代碼如下:
    public static void main(String[] args) throws Exception, IOException {
        String url = "http://localhost:8080/test.jsp";
        PostMethod postMethod = new UTF8PostMethod(url);
        //填入各個表單域的值
        NameValuePair[] data = {
                new NameValuePair("TEXT", "中文"),
        };
        //將表單的值放入postMethod中
        postMethod.setRequestBody(data);
        //執行postMethod
        HttpClient httpClient= new HttpClient();
        httpClient.executeMethod(postMethod);
    }
    
    //Inner class for UTF-8 support
    public static class UTF8PostMethod extends PostMethod{
        public UTF8PostMethod(String url){
            super(url);
        }
        @Override
        public String getRequestCharSet() {
            //return super.getRequestCharSet();
            return "UTF-8";
        }
    }


運行這個測試程序, 在 Tomcat 的后台輸出中可以正確打印出 ">>>> The result is 中文" .


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM