httplib模塊
1、簡介
httplib是用於http請求的庫。它主要由HTTPMessage,HTTPResponse,HTTPConnection,HTTPSConnection四個類組成。HTTPMessage表示http頭部,HTTPResponse表示http響應,HTTPConnection表示http連接,HTTPSConnection表示https連接。HTTPConnection和HTTPSConnection,構建http請求。然后返回響應HTTPResponse。HTTPResponse中分為頭部和實體,頭部由HTTPMessage表示。
2、示例
2.1、代碼示例
import httplib, urllib param = urllib.urlencode({'@number': 12524, '@type': 'issue', '@action': 'show'}) headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"} conn = httplib.HTTPConnection("bugs.python.org") """ httplib.HTTPConnection ( host [ , port [ , strict [ , timeout ]]] ) 參數host表示服務器主機,如:www.csdn.net; port為端口號,默認值為80; 參數strict的默認值為false, 表示在無法解析服務器返回的狀態行時( status line) (比較典型的狀態行如: HTTP/1.0 200 OK ),是否拋BadStatusLine 異常; 可選參數timeout 表示超時時間。 """ conn.request("POST", "", param, headers) """ 調用request 方法會向服務器發送一次請求,method 表示請求的方法,常用有方法有get 和post ; url 表示請求的資源的url ; body 表示提交到服務器的數據,必須是字符串(如果method 是"post" ,則可以把body 理解為html 表單中的數據); headers 表示請求的http頭。 """ # getresponse獲取Http響應 response = conn.getresponse() # response.reason返回服務器處理請求的結果說明 # response.status獲取響應的狀態碼 print response.status, response.reason # 結果 # 301 Moved Permanently data = response.read() print data # 結果 # <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> # <html><head> # <title>301 Moved Permanently</title> # </head><body> # <h1>Moved Permanently</h1> # <p>The document has moved <a href="https://bugs.python.org/">here</a>.</p> # <hr> # <address>Apache/2.2.16 (Debian) Server at bugs.python.org Port 80</address> # </body></html> conn.close()
2.2、參數解釋
HTTPConnection必須以server location來初始化,意圖就是一個HTTPConnection表示,只能對一個location請求。
用戶調用conn.request指定method,path,body,headers,發起請求。
調用conn.getresponse返回HTTPResponse響應。
3、拓展(其它接口)
3.1、接口
connect # 更新self.sock屬性。 putrequest # 構建起始行和HOST和Accept-Encoding頭部,因為這兩個和http的version有關。 putheader # 構建頭部行 endheaders # 發送起始行,headers和body close # 關閉連接 set_tunnel # 設置隧道
3.2、業務角度
首先建立socket連接,
然后構建起始行,
構建headers,
發送request的請求,
然后返回http響應。
4、HTTPSConnection基於HTTPConnection的實現
def connect(self): "Connect to a host on a given (SSL) port." sock = socket.create_connection((self.host, self.port), self.timeout, self.source_address) if self._tunnel_host: self.sock = sock self._tunnel() self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
它復寫了connect方法,https需要key_file, cert_file來建立連接。但沒有使用connect的參數傳遞,而是通過類的__init__方法傳遞,通過屬性。
這種形式比connect參數傳遞會好,因為接口的設計,如果兼顧到很多功能,會有許多默認參數。而且對以后的擴展,也不好。但這種__init__方法,也需要考慮到許多默認參數,而且參數的作用相比沒那么直接。
5、發送數據方法
def _output(self, s): """Add a line of output to the current request buffer. Assumes that the line does *not* end with \\r\\n. """ self._buffer.append(s)
self._buffer = [],它的元素是http頭部的每一行。在_send_output方法中,會被格式化成標准http格式。
def _send_output(self, message_body=None): """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. A message_body may be specified, to be appended to the request. """ self._buffer.extend(("", "")) msg = "\r\n".join(self._buffer) del self._buffer[:] # If msg and message_body are sent in a single send() call, # it will avoid performance problems caused by the interaction # between delayed ack and the Nagle algorithm. if isinstance(message_body, str): msg += message_body message_body = None self.send(msg) if message_body is not None: #message_body was not a string (i.e. it is a file) and #we must run the risk of Nagle self.send(message_body)
可以看到msg變量是由self._buffer通過\r\n來連接起來的,格式化成標准的http頭部。然后調用send方法,把http頭部和http實體發送出去。
def send(self, data): """Send `data' to the server.""" if self.sock is None: if self.auto_open: self.connect() else: raise NotConnected() if self.debuglevel > 0: print "send:", repr(data) blocksize = 8192 if hasattr(data,'read') and not isinstance(data, array): if self.debuglevel > 0: print "sendIng a read()able" datablock = data.read(blocksize) while datablock: self.sock.sendall(datablock) datablock = data.read(blocksize) else: self.sock.sendall(data)
send方法,只是負責向socket發送數據。它支持data的read屬性,會不斷的從data中獲取數據,然后發送出去。
def putheader(self, header, *values): """Send a request header line to the server. For example: h.putheader('Accept', 'text/html') """ if self.__state != _CS_REQ_STARTED: raise CannotSendHeader() hdr = '%s: %s' % (header, '\r\n\t'.join([str(v) for v in values])) self._output(hdr)
putheader方法很簡單,只是簡單的構建頭部。
def request(self, method, url, body=None, headers={}): """Send a complete request to the server.""" self._send_request(method, url, body, headers)
_send_request方法的定義:
def _send_request(self, method, url, body, headers): # Honor explicitly requested Host: and Accept-Encoding: headers. header_names = dict.fromkeys([k.lower() for k in headers]) skips = {} if 'host' in header_names: skips['skip_host'] = 1 if 'accept-encoding' in header_names: skips['skip_accept_encoding'] = 1 self.putrequest(method, url, **skips) if body is not None and 'content-length' not in header_names: self._set_content_length(body) for hdr, value in headers.iteritems(): self.putheader(hdr, value) self.endheaders(body)
首先是調用putrequest構建起始行
然后調用putheader構建頭部
最后調用endheaders構建實體,並且發送。
def getresponse(self, buffering=False): "Get the response from the server." if self.__state != _CS_REQ_SENT or self.__response: raise ResponseNotReady() args = (self.sock,) kwds = {"strict":self.strict, "method":self._method} if self.debuglevel > 0: args += (self.debuglevel,) if buffering: #only add this keyword if non-default, for compatibility with #other response_classes. kwds["buffering"] = True; response = self.response_class(*args, **kwds) response.begin() assert response.will_close != _UNKNOWN self.__state = _CS_IDLE if response.will_close: # this effectively passes the connection to the response self.close() else: # remember this, so we can tell when it is complete self.__response = response return response
getresponse方法,使用self.sock實例化HTTPResponse對象,然后調用HTTPResponse的begin方法。HTTPResponse主要負責基於socket,對http響應的解析。