HTTP請求的python實現（urlopen、headers處理、 Cookie處理、設置Timeout超時、重定向、Proxy的設置）

本文轉載自查看原文 2018-08-01 20:18 4160 python 爬蟲/ 學習區6.1 【python 爬蟲】

python實現HTTP請求的三中方式：urllib2/urllib、httplib/urllib 以及Requests

urllib2/urllib實現

urllib2和urllib是python兩個內置的模塊，要實現HTTP功能，實現方式是以urllib2為主，urllib為輔

1 首先實現一個完整的請求與響應模型

urllib2提供基礎函數urlopen，

import urllib2
response = urllib2.urlopen('http://www.cnblogs.com/guguobao')
html = response.read()
print html

改進，分兩步：請求和響應

#!coding:utf-8
import urllib2
#請求
request = urllib2.Request('http://www.cnblogs.com/guguobao')
#響應
response = urllib2.urlopen(request)
html = response.read()
print html

上面使用GET請求，下面改為POST請求，使用urllib。

#!coding:utf-8
import urllib
import urllib2
url = 'http://www.cnblogs.com/login'
postdata = {'username' : 'qiye',
           'password' : 'qiye_pass'}
#info 需要被編碼為urllib2能理解的格式，這里用到的是urllib
data = urllib.urlencode(postdata)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
html = response.read()

- 然而運行結果沒有輸出，因為服務器拒絕你的訪問，需要檢驗請求頭信息，來判斷是否是來自瀏覽器的請求

2 請求頭headers處理

把上面的列子添加User-Agent域和Referer域信息
- User-Agent：有些服務器或Proxy會檢查該值是否是瀏覽器發出的信息
- Content-Type：在使用REST接口時，服務器會檢查該值，確定HTTP body用什么解析。否則報錯，拒絕回應。取值詳情：http://www.runoob.com/http/http-content-type.html
- Referer:服務器檢查防盜鏈

#coding:utf-8
#請求頭headers處理:設置一下請求頭中的User-Agent域和Referer域信息
import urllib
import urllib2
url = 'http://www.xxxxxx.com/login'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
referer='http://www.xxxxxx.com/'
postdata = {'username' : 'qiye',
           'password' : 'qiye_pass'}
# 將user_agent,referer寫入頭信息
headers={'User-Agent':user_agent,'Referer':referer}
data = urllib.urlencode(postdata)
req = urllib2.Request(url, data,headers)
response = urllib2.urlopen(req)
html = response.read()

3 Cookie處理

urllib2對Cookie的處理也是自動，使用CookieJar函數進行Cookie的管理，如果需要得到某個Cookie項的值，可以這樣：

import urllib2,cookielib

cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
response = opener.open('http://www.zhihu.com')
for item in cookie:
    print item.name+':'+item.name

但有時遇到情況，我們不想讓urllib2自動處理，我們想自己添加Cookie的內容，可以通過設置請求頭中的cookie域來做

import urllib2,cookielib

opener = urllib2.build_opener()
opener.addheaders.append(('Cookie','email='+'helloguguobao@gmail.com'))#Cookie和email替換什么值都可以，但不能沒有
req = urllib2.Request('http://www.zhihu.com')
response = opener.open(req)
print response.headers
retdata = response.read()

運行截圖

4 設置Timeout超時

在python2.6及新版中，urlopen函數提供對Timeout的設置：

import urllib2
request=urllib2.Request('http://www.zhihu.com')
response = urllib2.urlopen(request,timeout=2)
html=response.read()
print html

5 獲取HTTP響應碼

只要使用urlopen返回的response對象的getcode()方法就可以得到HTTP返回碼。

import urllib2
try:
    response = urllib2.urlopen('http://www.google.com')
    print response
except urllib2.HTTPError as e:
    if hasattr(e, 'code'):
        print 'Error code:',e.code

6. 重定向

urllib2默認情況下會對HTTP 3XX返回碼自動進行重定向動作。要檢測是否發生重定向動作，只要檢查一下Response的URL和Request的URL是否一致：

import urllib2
response = urllib2.urlopen('http://www.zhihu.cn')
isRedirected = response.geturl() == 'http://www.zhihu.cn'

如果不想自動重定向，可以自定義HTTPRedirectHandler類：

import urllib2
class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):
        pass
    def http_error_302(self, req, fp, code, msg, headers):
        result =urllib2.HTTPRedirectHandler.http_error_301(self,req,fp,code,msg,headers)
        result.status =code
        result.newurl = result.geturl()
        return result

opener = urllib2.build_opener(RedirectHandler)
opener.open('http://www.zhihu.cn')

7 Proxy的設置

在做爬蟲開發中，可能會用到代理。urllib2默認會使用環境變量http_proxy來設置HTTP Proxy。但是我們一般不采用這種方法，而是使用ProxyHandler在程序中動態設置代理

import urllib2
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:1080'})# 運行時需要把socketsocks關閉系統代理。並使用1080端口，或者直接退出socketsocks軟件
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://www.zhihu.com/')
print response.read()

這里要注意一個細節，使用urllib2.install_opener()會設置urllib2的全局opener，之后，所有的HTTP訪問都會使用這個代理，這樣很方便，但是，想在程序中使用兩個不同的代理，就不能使用install_opener去更改全局的設置，而是直接調用urllib2.open()

import urllib2
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:1080'})
opener = urllib2.build_opener(proxy,)
response = opener.open("http://www.google.com/")
print response.read()

運行時需要把socketsocks關閉系統代理。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python HTTP 請求時對重定向中的 cookie 的處理 loadrunner處理HTTP重定向請求 Flutter用dio封裝http網絡請求,設置統一的請求地址、headers及處理返回內容 http超時請求設置 vue 設置請求超時時間處理請求超時的處理——axios.defaults.timeout python + seleinum +phantomjs 設置headers和proxy代理 vue-resource請求超時timeout設置 axios設置請求超時時間 timeout rxjs處理http請求超時

HTTP請求的python實現（urlopen、headers處理、 Cookie處理、設置Timeout超時、 重定向、Proxy的設置）

python實現HTTP請求的三中方式：urllib2/urllib、httplib/urllib 以及Requests

urllib2/urllib實現

1 首先實現一個完整的請求與響應模型

2 請求頭headers處理

3 Cookie處理

4 設置Timeout超時

5 獲取HTTP響應碼

6. 重定向

7 Proxy的設置

免責聲明！

HTTP請求的python實現（urlopen、headers處理、 Cookie處理、設置Timeout超時、重定向、Proxy的設置）