HTTP協議 HttpWebRequest和 Socket的一點總結
相信接觸過網絡開發的人對HTTP、HttpWebRequest、Socket這些東西都不陌生吧。它們之間的一些介紹和關系我這里都忽略了。開我們平時開發過程中也是很少有機會接觸大什么大並發這個東東,一般大並發我們都需要考慮異步和多線程以及對象池,這個我們以一個簡單demo來講解吧。
主要的調用關系圖如下:
類的結構圖如下:
一:這里我們依次對這些類做一個簡單的說明
HttpRequestInfo:
public string Url:http請求的url字符串,如http://www.baidu.com/
public byte[] PostData:Post請求中的數據
public WebHeaderCollection Headers:請求的頭部數據
public bool AllowAutoRedirect :是否允許301、302自動跳轉,如果你想獲取請求返回的頭部信息,建議一般設置為false
public Dictionary<string, string> ExternalData :請求過程中附加的數據(如數據記錄的ID),便於在成功或失敗回調函數中調用
public Action<HttpContextInfo> ActionCompleted :請求成功后所調用的函數
public Action<HttpRequestException> ActionException:請求失敗所調用函數
public HttpRequestInfo Clone():返回當前對象的一個副本。
HttpResponseInfo:
public Stream ResponseContent :Http請求返回內容(除頭部信息)的對象流
public HttpStatusCode StatusCode:Http返回的狀態
public string StatusDescription :Http狀態描述
public WebHeaderCollection Headers:Http返回的頭部信息
public string GetString(Encoding coding):把http返回體中數據流轉換為字符串,轉換編碼就是我們所傳參數。
public interface IHttpRequest
{
void GetResponseAsync(HttpRequestInfo request);
bool IsBusy { set; get; }
}
在IHttpRequest接口中,IsBusy屬性主要是判斷當前對象是否正在使用中,GetResponseAsync方法是真正完成Http請求的方法。
這里我們主要看看HttpRequestFactory的封裝吧,管理對象實例的個數,相當於一個對象池,這里的代碼主要是基於。net framework2.0的,
首先我們需要2個集合分別管理HttpRequestInfo請求實例和IHttpRequest處理請求實例,
static Queue<HttpRequestInfo> requestTask = new Queue<HttpRequestInfo>();
static List<IHttpRequest> Handlers = new List<IHttpRequest>();
而我們暴露給外部的一個主要方法就是AddRequestTask,
public static void AddRequestTask(HttpRequestInfo info)
{
if (!string.IsNullOrEmpty(info.Url))
{
lock (Lockobj)
{
Interlocked.Increment(ref requestCount);
requestTask.Enqueue(info);
}
}
}
那么這些請求在什么時候被處理了,在一個叫Process方法中處理,
private static void Process(object obj) { while (true) { IHttpRequest handler = GetAvailableHttpRequest(); while (handler == null) { Thread.Sleep(100); handler = GetAvailableHttpRequest(); } HttpRequestInfo task = GetTask(); while (task == null) { Thread.Sleep(100); task = GetTask(); } if (task != null && handler != null) { Interlocked.Decrement(ref requestCount); handler.GetResponseAsync(task); } // Thread.Sleep(10); } }
在這個方法中我們需要調用GetAvailableHttpRequest來獲取IHttpRequest處理對象實例,調用GetTask來獲取HttpRequestInfo 請求實例。如果這2個實例都存在我們調用 IHttpRequest.GetResponseAsync(HttpRequestInfo );方法開始處理http請求。
GetAvailableHttpRequest如下:
private static IHttpRequest GetAvailableHttpRequest() { lock (Lockobj) { for (int i = 0; i < Handlers.Count; i++) { if (!Handlers[i].IsBusy) { return Handlers[i]; } } if (Handlers.Count <= MaxRequestCount) { IHttpRequest handler = (IHttpRequest)Activator.CreateInstance(_httpRequestType); Handlers.Add(handler); return handler; } } return null; //return GetAvailableHttpRequest(); }
在GetAvailableHttpRequest方法中,我們首先在處理對象集合中查找是否有空閑對象,如果有就返回,否則檢查當前對象實例個數個數是否達到最大個數,如果沒有我們則創建新實例,且加入到集合中,再返回,否者返回null。所以在Process方法中有一個檢查,看啊看你返回的IHttpRequest是否為null,請注意這里一定不要用遞歸來返回有效的IHttpRequest,建議用一個死循環來處理,如果用遞歸一般不會出現什么問題,但是遞歸層次嵌套太深就會出現棧溢出錯誤,我在測試的時候曾經出現過這個問題。GetTask和GetAvailableHttpRequest處理一樣。
那么這里的Process方法有在什么地方調用了,在HttpRequestFactory的靜態構造函數中調用。
static HttpRequestFactory()
{
MaxRequestCount = 10;
ThreadPool.QueueUserWorkItem(new WaitCallback(Process));
}
到這里我們的一個對象池就構造玩了。
二 現在我們來看看RequestHttpWebRequest是如何處理HTTP請求的。它主要使用HttpWebRequest來處理請求。
這里我們主要使用HttpWebRequest的異步方法,因此我們需要構造一個狀態對象StateObjectInfo
class StateObjectInfo : HttpContextInfo
{
internal byte[] Buffer { set; get; } //把返回的流寫到HttpResponseInfo.ResponseContent 時用到的暫存數組
internal Stream ReadStream { set; get; }//把返回的流寫到HttpResponseInfo.ResponseContent
internal HttpWebRequest HttpWebRequest { set; get; }
internal RequestHttpWebRequest RequestHandler { set; get; }//主要便於后面改IsBusy屬性。
}
其GetResponseAsync實現如下:
public void GetResponseAsync(HttpRequestInfo info) { HttpWebRequest webRequest; StateObjectInfo state; InitWebRequest(info, out webRequest, out state); try { if (IsHttpPost) { webRequest.Method = "POST"; webRequest.ContentType = "application/x-www-form-urlencoded"; webRequest.BeginGetRequestStream(EndRequest, state); } else { webRequest.BeginGetResponse(EndResponse, state); } } catch (Exception ex) { HandException(ex, state); } }
其中InitWebRequest的實現如下:
private void InitWebRequest(HttpRequestInfo info, out HttpWebRequest webRequest, out StateObjectInfo state) { IsBusy = true; if (info.PostData != null && info.PostData.Length > 0) { IsHttpPost = true; } else { IsHttpPost = false; } if (info.Url.ToLower().Trim().StartsWith("https")) { IsHttps = true; ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(CheckValidationResult); ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3; } else { IsHttps = false; } webRequest = HttpWebRequest.CreateDefault(new Uri(info.Url)) as HttpWebRequest; if (IsHttps) { /*基礎連接已經關閉: 發送時發生錯誤 */ /*無法從傳輸連接中讀取數據: 遠程主機強迫關閉了一個現有的連接*/ webRequest.KeepAlive = false; webRequest.ProtocolVersion = HttpVersion.Version10; webRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"; } webRequest.AllowAutoRedirect = info.AllowAutoRedirect; if (info.Headers != null && info.Headers.Count > 0) { foreach (string key in info.Headers.Keys) { webRequest.Headers.Add(key, info.Headers[key]); } } //webRequest.Proxy = WebProxy.GetDefaultProxy(); //webRequest.Proxy.Credentials = CredentialCache.DefaultCredentials; //webResponse.Headers.Get("Set-Cookie"); state = new StateObjectInfo { Buffer = new byte[1024 * 1024], HttpWebRequest = webRequest, RequestHandler = this, RequestInfo = info, ResponseInfo = new HttpResponseInfo() }; }
關於該類的EndRequest、EndResponse我想就沒什么說的了,其中ReadCallBack的實現如下:
void ReadCallBack(IAsyncResult ar) { StateObjectInfo state = ar.AsyncState as StateObjectInfo; try { int read = state.ReadStream.EndRead(ar); if (read > 0) { state.ResponseInfo.ResponseContent.Write(state.Buffer, 0, read); state.ReadStream.BeginRead(state.Buffer, 0, state.Buffer.Length, ReadCallBack, state); } else { state.ReadStream.Close(); state.HttpWebRequest.Abort(); if (state.RequestInfo.ActionCompleted != null) { state.ResponseInfo.ResponseContent.Seek(0, SeekOrigin.Begin); state.RequestInfo.ActionCompleted(state); } state.Buffer = null; state.RequestHandler.IsBusy = false; } } catch (Exception ex) { HandException(ex, state); } }
這里還有一個HandException方法需要我們注意:
private void HandException(Exception ex, StateObjectInfo state) { if (state.ReadStream != null) state.ReadStream.Close(); if (state.HttpWebRequest != null) state.HttpWebRequest.Abort(); state.Buffer = null; if (state.RequestInfo.ActionException != null) { state.RequestInfo.ActionException(new HttpRequestException(state, ex)); } state.RequestHandler.IsBusy = false; }
這里我們在使用HttpWebRequest的時候,在完成使用后一定要關閉請求流。
在我們來看看一個簡單的調用把:
public static void DownLoadFile(string remoteurl, string destinationFilePath, string id) { try { if (HasIllegalCharacters(destinationFilePath, false)) { SetFileCopyed(id, "400", "HasIllegalCharacters"); return; } DirectoryInfo dir = new DirectoryInfo(destinationFilePath); FileInfo destinationFile = new FileInfo(destinationFilePath); if (!destinationFile.Directory.Exists) { destinationFile.Directory.Create(); } HttpRequestInfo request = new HttpRequestInfo(remoteurl); request.ActionCompleted = new Action<HttpContextInfo>(x => { if (x.ResponseInfo.StatusCode == HttpStatusCode.OK) { using (Stream wr = File.Open(destinationFilePath, FileMode.OpenOrCreate, FileAccess.Write), sr = x.ResponseInfo.ResponseContent) { byte[] data = new byte[1024 * 1024]; int readcount = sr.Read(data, 0, data.Length); while (readcount > 0) { wr.Write(data, 0, readcount); readcount = sr.Read(data, 0, data.Length); } } SetFileCopyed(id, "200", string.Empty); } else { SetFileCopyed(id, ((int)x.ResponseInfo.StatusCode).ToString(), x.ResponseInfo.StatusDescription); string message = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss") + " : " + remoteurl + " : " + x.ResponseInfo.StatusDescription; LogManager.LogException(message); } }); request.ActionException = new Action<HttpRequestException>(ex => { Regex reg = new Regex(@"\d{3}",RegexOptions.Compiled); string message = ex.Message; Match m = reg.Match(message); if (m.Success) { SetFileCopyed(id, m.Value, message); } else { SetFileCopyed(id, "503", message); HttpRequestInfo newRequest = ex.HttpContextInfo.RequestInfo.Clone(); request.ActionCompleted = null; request.ActionException = null; HttpRequestFactory.AddRequestTask(newRequest); } message = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss") + " : " + ex.HttpContextInfo.RequestInfo.Url + " : " + message; LogManager.LogException(message); }); HttpRequestFactory.AddRequestTask(request); } catch (Exception ex) { SetFileCopyed(id, "-1", ex.Message); string message = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss") + " : " + remoteurl + " : " + ex.Message; LogManager.LogException(message); } } internal static bool HasIllegalCharacters(string path, bool checkAdditional) { for (int i = 0; i < path.Length; i++) { int num2 = path[i]; if (((num2 == 0x22) || (num2 == 60)) || (((num2 == 0x3e) || (num2 == 0x7c)) || (num2 < 0x20))) { return true; } if (checkAdditional && ((num2 == 0x3f) || (num2 == 0x2a))) { return true; } } return false; }
對於這個調用的demo我這里就不多說,不過在調用的時候偶爾會出現:
/*基礎連接已經關閉: 發送時發生錯誤 */
/*無法從傳輸連接中讀取數據: 遠程主機強迫關閉了一個現有的連接*/
這樣的錯誤,網上有一些什么改良方法,我測試后都不管用,個人猜測是與網絡有關的,即使我用socket來做偶爾也會有一些問題。所以當我們遇到這些網絡問題的時候,我們把我們的請求再次加入請求隊列中 HttpRequestFactory.AddRequestTask(newRequest);。這一很重要的哦。
HttpWebRequest類對我們做http請求做了很多封裝,我們使用也很方便。但是它的性能比我們自己用socket要低很多,同時在一些處理上違背了我們的操作習慣。如我們上面的調用代碼:
如果我們http返回狀態是403、404...(除200以外)程序沒有進入我的else,而是進入我的ActionException方法里面了,這點讓我很是不爽。於是萌生了用socket來做http請求的念頭。
三 現在我們來看看SocketHttpRequest是如何處理HTTP請求的。它主要使用Socket來處理請求。
SocketHttpRequest和RequestHttpWebRequest一樣都是采用對象的異步模式,那么也需要一個狀態對象:
class RequestSockeStateObject : HttpContextInfo
{
internal SocketHttpRequest RequestHandler { set; get; }
internal Socket _socket { set; get; } //普通http請求采用socket
internal List<byte> HeaderBuffer { set; get; }
internal byte[] Buffer { set; get; }
internal int ContentLength { set; get; }//http需要接收的數據長度
internal int ReceiveLength { set; get; }//http已經接收的數據長度
internal SslStream SslStream { set; get; }//https請求采用TcpClient,這里需要用到SslStream
}
public void GetResponseAsync(HttpRequestInfo info)
{
RequestSockeStateObject _state;
InitRequestSockeStateObject(info, out _state);
SocketConnection(_state);
}
這里的InitRequestSockeStateObject和RequestHttpWebRequest的InitWebRequest方法差不多,就不在累贅了。
主要看看SocketConnection方法:
void SocketConnection(RequestSockeStateObject _state) { try { Uri uri = new Uri(_state.RequestInfo.Url); IPHostEntry hostEntry = Dns.GetHostEntry(uri.Host); if (IsHttps) { TcpClient tcpclient = new TcpClient(); tcpclient.Connect(hostEntry.AddressList, uri.Port); _state._socket = tcpclient.Client; SslStream sslStream = new SslStream(tcpclient.GetStream(), false, new RemoteCertificateValidationCallback(ValidateServerCertificate), null); sslStream.AuthenticateAsClient(hostEntry.HostName, new X509CertificateCollection(), SslProtocols.Ssl3 | SslProtocols.Tls, false); _state.SslStream = sslStream; Begin_Write(_state); } else { Socket client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp); client.Connect(hostEntry.AddressList, uri.Port); _state._socket = client; BeginSend(_state); } } catch (Exception ex) { HandException(ex, _state); } }
socket連接是需要IP和端口的,這里我們借用 Uri來獲取所需端口,但是一台計算機的ip可能有很多個,實際只有一兩個可以連接,所以我們這里需要調用 client.Connect(hostEntry.AddressList, uri.Port)方法,傳遞一個ip集合。如果是https的話,直接用socket我沒有搞定,最后用SslStream 搞定,不知道大家有沒有其他方法。
其中Begin_Write、End_Write、Complete_Read方法是sslStream異步中所必需的方法,BeginSend、Send_Completed、Receive_Completed、RepeatReceive是socket異步中所需方法。其中Complete_Read和Receive_Completed方法相似。
protected virtual void Complete_Read(IAsyncResult ar) { RequestSockeStateObject state = ar.AsyncState as RequestSockeStateObject; try { int byteCount = state.SslStream.EndRead(ar); if (state.ResponseInfo.Headers.Count < 1) { SetResponseHeaders(state, byteCount); if ((state.ReceiveLength == state.ContentLength && state.ContentLength > 0)) { EndReceive(state); } else { state.SslStream.BeginRead(state.Buffer, 0, state.Buffer.Length, Complete_Read, state); } } else { if (byteCount > 0 && byteCount==state.Buffer.Length) { state.ResponseInfo.ResponseContent.Write(state.Buffer, 0, byteCount); state.SslStream.BeginRead(state.Buffer, 0, state.Buffer.Length, Complete_Read, state); } else { state.ResponseInfo.ResponseContent.Write(state.Buffer, 0, byteCount); EndReceive(state); } } } catch (Exception ex) { HandException(ex, state); } }
如果是第一次接收數據流,我們必須把數據流中http頭部信息取出來,再把頭部信息以外的數據流寫到HttpResponseInfo.ResponseContent中,如果我們已經接收的數據等於我們需要接收的數據,就表示我們已經接收完畢了。如果沒有就繼續接收數據。在第二次及以后所接收數據過程中,我們需要判斷接收數據長度是否小於接收數組的長度,如果小於就表示接收完畢了,否則繼續接收。這里的EndReceive方法如下:
void EndReceive(RequestSockeStateObject state) { /* * if (state.RequestInfo.AllowAutoRedirect && (state.ResponseInfo.StatusCode == HttpStatusCode.Found || state.ResponseInfo.StatusCode == HttpStatusCode.MovedPermanently)) { string location = state.ResponseInfo.Headers["Location"]; state.RequestInfo.Url = location; state.RequestInfo.Headers = state.ResponseInfo.Headers; state.RequestInfo.Headers.Remove("Location"); state.RequestInfo.Headers.Add("Referer", location); Begin_Write(state); } */ if (IsHttps) { state.SslStream.Close(); state.SslStream = null; } else { state._socket.Shutdown(SocketShutdown.Both); state._socket.Close(); state._socket = null; } if (state.RequestInfo.ActionCompleted != null) { state.ResponseInfo.ResponseContent.Seek(0, SeekOrigin.Begin); state.RequestInfo.ActionCompleted(state); } state.RequestHandler.IsBusy = false; }
EndReceive方法主要是關閉socket或則SslStream數據流,然后調用ActionCompleted方法。在這里state.ResponseInfo.ResponseContent.Seek(0, SeekOrigin.Begin);這個方法非常重要,不然在外面的調用方法就必須調用Stream.Seek(0, SeekOrigin.Begin)來吧數據流定位開始位置。
在SocketHttpRequest這個類中,我們是如何來獲取發送的http請求信息以及如何解析http返回的header信息了?
首先來看一個GetRequestData方法,它主要是通過RequestInfo實例來獲取請求信息:
byte[] GetRequestData(RequestSockeStateObject _state) { StringBuilder bufRequest = new StringBuilder(); Uri uri = new Uri(_state.RequestInfo.Url); if (!IsHttpPost) { bufRequest.Append("GET ").Append(uri.OriginalString).AppendLine(" HTTP/1.1"); } else { bufRequest.Append("POST ").Append(uri.OriginalString).AppendLine(" HTTP/1.1"); string contentLengthkey = "Content-Length"; string contentTypekey = "Content-Type"; List<string> headerKeys = new List<string>(_state.RequestInfo.Headers.AllKeys); if (headerKeys.Contains(contentLengthkey)) { _state.RequestInfo.Headers.Remove(contentLengthkey); } if (headerKeys.Contains(contentTypekey)) { _state.RequestInfo.Headers.Remove(contentTypekey); } _state.RequestInfo.Headers.Add(contentTypekey, "application/x-www-form-urlencoded"); _state.RequestInfo.Headers.Add(contentLengthkey, _state.RequestInfo.PostData.Length.ToString()); } _state.RequestInfo.Headers.Add("Host", uri.Host); _state.RequestInfo.Headers.Add("Connection", "keep-alive"); if (_state.RequestInfo.Headers.Count > 0) { bufRequest.Append(_state.RequestInfo.Headers.ToString()); } byte[] byteData = Encoding.ASCII.GetBytes(bufRequest.ToString()); if (!IsHttpPost) { return byteData; } else { byte[] sendData = new byte[byteData.Length + _state.RequestInfo.PostData.Length]; Array.Copy(byteData, 0, sendData, 0, byteData.Length); Array.Copy(_state.RequestInfo.PostData, 0, sendData, byteData.Length, _state.RequestInfo.PostData.Length); return sendData; } }
有關請求和header信息的字符串建議不要自己拼接,用WebHeaderCollection實例的ToString方法來生成,並且它把最后那個回車換行也生成。
那么如何更具換回的流中的數據來設置它的返回頭信息了,這里我們有一個方法SetResponseHeaders:
void SetResponseHeaders(RequestSockeStateObject state, int bytesRead) { try { byte[] tempArray = new byte[bytesRead]; Array.Copy(state.Buffer, 0, tempArray, 0, bytesRead); state.HeaderBuffer.AddRange(tempArray); tempArray = state.HeaderBuffer.ToArray(); string headerSpilt = "\r\n\r\n"; byte[] headerbyte = Encoding.ASCII.GetBytes(headerSpilt); int contentindex = DestTag(tempArray, headerbyte, 0, tempArray.Length); if (contentindex > 0) { string headerStr = Encoding.ASCII.GetString(tempArray, 0, contentindex); int startIndex = contentindex + headerbyte.Length; SetResponseHeaders(headerStr, state); state.ReceiveLength = tempArray.Length - startIndex; state.ResponseInfo.ResponseContent.Write(tempArray, startIndex, tempArray.Length - startIndex); state.HeaderBuffer.Clear(); } } catch (Exception ex) { HandException(ex, state); } }
這里的bytesRead時我第一次接收的數據流長度,首先我們需要在返回流中找到連續的\r\n\r\n 信息,它前面是返回頭信息,后面的時返回體信息。這里我們用自定義的DestTag方法來查找。SetResponseHeaders方法如下:
void SetResponseHeaders(string headerStr, RequestSockeStateObject state) { try { string[] headers = headerStr.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries); string statline = headers[0]; state.ResponseInfo.StatusCode = GetStatusCode(statline); for (int i = 1; i < headers.Length; i++) { int index = headers[i].IndexOf(":"); if (index > 1) { string key = headers[i].Substring(0, index); string value = headers[i].Substring(index + 1); state.ResponseInfo.Headers.Add(key.Trim(), value.Trim()); } } string contentLength = state.ResponseInfo.Headers["Content-Length"]; state.ContentLength = int.Parse(contentLength); state.ReceiveLength = 0; } catch (Exception ex) { HandException(ex, state); } }
以上就是這個類的主要方法,目前SocketHttpRequest對301、302暫不支持。
完整的代碼如下:
SocketHttpRequest代碼:
調用代碼:
有不對的地方還請大家拍磚!