前文介紹了urllib2的常見問題的解決方案,今天來特別討論下urllib2中短連接問題。
1、urllib2代碼
如下文代碼所示,自定義 'Connection': 'keep-alive',通知服務器交互結束后,不斷開連接,即所謂長連接。
1 #測試8 使用urllib2 測試Connection=keep-alive 2 import urllib2 3 import cookielib 4 5 6 httpHandler = urllib2.HTTPHandler(debuglevel=1) 7 httpsHandler = urllib2.HTTPSHandler(debuglevel=1) 8 opener = urllib2.build_opener(httpHandler, httpsHandler) 9 urllib2.install_opener(opener) 10 11 loginHeaders={ 12 'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.0 Chrome/30.0.1599.101 Safari/537.36', 13 'Referer': 'https://www.baidu.com', 14 'Connection': 'keep-alive' 15 } 16 request=urllib2.Request('http://www.suning.com.cn',headers=loginHeaders) 17 response = urllib2.urlopen(request) 18 page='' 19 page= response.read() 20 print response.info() 21 print page
注意日志中划線部分,可以看到請求報文其他頭部,例如User-agent已被修改成功,但connection仍然保持close
- Connection: close
- header: Connection: close
1 send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.suning.com.cn\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.0 Chrome/30.0.1599.101 Safari/537.36\r\n\r\n' 2 reply: 'HTTP/1.1 200 OK\r\n' 3 header: Connection: close 4 header: Transfer-Encoding: chunked 5 header: Expires: Thu, 19 Nov 1981 08:52:00 GMT 6 header: Date: Sun, 15 May 2016 06:09:43 GMT 7 header: Content-Type: text/html; charset=utf-8 8 header: Server: nginx/1.2.9 9 header: Vary: Accept-Encoding 10 header: X-Powered-By: ThinkPHP 11 header: Set-Cookie: PHPSESSID=tv7ced9sbu7lph60o8ilfj4207; path=/ 12 header: Cache-Control: private 13 header: Pragma: no-cache 14 <!doctype html> 15 <html lang="zh-CN"> 16 <head> 17 <meta charset="utf-8"> 18 <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
2、httplib2寫法代碼
換成httplib2協議的代碼,當然這也是urllib2不支持keep-alive的解決辦法之一,另一個方法是Requests。
1 #測試9 使用httplib2測試Connection=keep-alive 2 import httplib2 3 4 ghttp = httplib2.Http() 5 httplib2.debuglevel=1 6 loginHeaders={ 7 'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.0 Chrome/30.0.1599.101 Safari/537.36', 8 'Connection': 'keep-alive' 9 } 10 11 response ,page= ghttp.request('http://www.suning.com.cn',headers=loginHeaders ) 12 print page.decode('utf-8')
可以看到輸出中,長連接設置成功。
- header: Connection: Keep-Alive
1 connect: (www.suning.com.cn, 80) ************ 2 send: 'GET / HTTP/1.1\r\nHost: www.suning.com.cn\r\nconnection: keep-alive\r\naccept-encoding: gzip, deflate\r\nuser-agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.0 Chrome/30.0.1599.101 Safari/537.36\r\n\r\n' 3 reply: 'HTTP/1.1 200 OK\r\n' 4 header: Connection: Keep-Alive 5 header: Transfer-Encoding: chunked 6 header: Expires: Thu, 19 Nov 1981 08:52:00 GMT 7 header: Date: Sun, 15 May 2016 06:15:16 GMT 8 header: Content-Type: text/html; charset=utf-8 9 header: Server: nginx/1.2.9 10 header: Vary: Accept-Encoding 11 header: X-Powered-By: ThinkPHP 12 header: Set-Cookie: PHPSESSID=ep580j8bq3uba0ud3fe3rgu5i5; path=/ 13 header: Cache-Control: private 14 header: Pragma: no-cache
3、分析原因
還是上urllib2的源碼吧,可以看到在do_open核心方法中,connection被寫死成了close。
至於原因就是上面那一堆注釋,大概意思是addinfourl這個類一旦啟用長鏈接,可以讀取到上次交互未讀完的應答報文,為了防止此類情況,所以強制性將Connection寫死成close。
def do_open(self, http_class, req, **http_conn_args):
……
# We want to make an HTTP/1.1 request, but the addinfourl
# class isn't prepared to deal with a persistent connection.
# It will try to read all remaining data from the socket,
# which will block while the server waits for the next request.
# So make sure the connection gets closed after the (only)
# request.
headers["Connection"] = "close"
headers = dict((name.title(), val) for name, val in headers.items())
if req._tunnel_host:
tunnel_headers = {}
proxy_auth_hdr = "Proxy-Authorization"
if proxy_auth_hdr in headers:
tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr]
# Proxy-Authorization should not be sent to origin
# server.
del headers[proxy_auth_hdr]
h.set_tunnel(req._tunnel_host, headers=tunnel_headers)
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)