Requests:Python HTTP Module學習筆記（一）（轉）

本文轉載自查看原文 2013-10-31 15:06 4039 python

Requests:Python HTTP Module學習筆記（一）

在學習用python寫爬蟲的時候用到了Requests這個Http網絡庫，這個庫簡單好用並且功能強大，完全可以代替python的標准庫urllib2。在學習的同時把我的學習筆記記錄下來，資料基本上都是從Requests官網翻譯過來的，歡迎指出有錯誤或者有表述的不准確的地方。

1.介紹
Requests: HTTP for Humans
一句話：為地球人准備的網絡庫

python的標准庫urllib2已經提供了大部分你所需要的HTTP功能了，為什么還需要Requests庫來提供相同的功能呢？因為urllib2的API比較零散，它們編寫的時間和網絡環境都不一樣，利用它(urllib2)完成簡單的任務需要做大量的工作。

2.安裝
安裝也很簡單，用easy_install命令：
easy_install requests

用pip命令：
pip install requests

源代碼在這里github

3.例子
使用Requests：

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))   
>>> r.status_code  
200  
>>> r.headers['content-type']  
'application/json; charset=utf8'  
>>> r.encoding  
'utf-8'  
>>> r.text  
u'{"type":"User"...'  
>>> r.json()  
{u'private_gists': 419, u'total_private_repos': 77, ...}'

使用urllib2：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import urllib2

gh_url = 'https://api.github.com'
req = urllib2.Request(gh_url)

password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, gh_url, 'user', 'pass')

auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager) 
urllib2.install_opener(opener)

handler = urllib2.urlopen(req)
print handler.getcode()
print handler.headers.getheader('content-type')'

# ------
# 200
# 'application/json'

爬蟲的入門可以參考42區的一個小爬蟲。

4.功能介紹

Requests支持所有的Http請求類型，例如：

>>> r = requests.post("http://httpbin.org/post")
>>> r = requests.put("http://httpbin.org/put")
>>> r = requests.delete("http://httpbin.org/delete")
>>> r = requests.head("http://httpbin.org/get")
>>> r = requests.options("http://httpbin.org/get")

在URL中傳遞參數（Passing Parameters In URLs）

想在請求URL中帶參數，例如這樣httpbin.org/get?key=val，Requests允許你提供一個字典作為這些參數：

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get("http://httpbin.org/get", params=payload)

你可以看到輸出的URL已經經過正確的編碼：

>>> print r.url
u'http://httpbin.org/get?key2=value2&key1=value1'

響應內容（Response Content）

請求發送后，我們可以讀取服務器返回的內容，看剛剛的例子：

>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.text
'[{"repository":{"open_issues":0,"url":"https://github.com/...

Requests會自動decode服務器返回的內容，大部分unicode的字符都可以順利的decode。

當訪問r.text時，Requests會根據response的Http標頭信息使用相應的文本編碼輸出信息，可以查看Requests使用的是什么類型的編碼並改變它：

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

如果改變了編碼類型，當調用r.text的時候，Requests會使用你設置的編碼類型輸出信息。

二進制的相應內容（Binary Response Content）

對於非文本的請求，也可以把返回的響應內容作為字節類型輸出：

>>> r.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...

gzip和deflate編碼會為你自動解碼，
例如想把一個請求返回的二進制數據裝換成圖像，可以這樣寫：

>>> from PIL import Image
>>> from StringIO import StringIO
>>> i = Image.open(StringIO(r.content))

以json格式返回響應內容（JSON Response Content）

Requests還提供了一個json的decoder,可以方便的處理json格式的數據

>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...

調用r.json()方法時，如果請求的內容不是json格式，會導致json解碼失敗，拋出異常。

原始的響應內容（Raw Response Content）

如過想從服務器返回的響應中獲取原始的套接字響應，可以調用r.raw方法，在調用前需要在請求中設置stream=True，例如：

>>> r = requests.get('https:/github.com/timeline.json', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

自定義頭文件（Custom Headers）

如果想在請求時添加一個Http標頭，可以在請求時添加一個字典類型的參數，例如：

>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> headers = {'content-type': 'application/json'}
>>> r = requests.post(url, data=json.dumps(payload), headers=headers)

更復雜的post請求（More complicated POST requests）

通常，如果想要發送一些表單類型的數據（像Html表單這樣的數據），在這里，只需要把Html的表單數據構造成一個字典類型的參數附帶在請求中即可。

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print r.text
{
  ...
  "form": {
  "key2": "value2",
  "key1": "value1"
  },
   ...
}

很多時候我們發送的數據並不是表單類型的數據，就像我們想傳遞一個字符串類型而不是字典類型的數據時，數據會被直接的發送出去：

>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, data=json.dumps(payload))

上傳Multipart-Encoded的文件（POST a Multipart-Encoded File）

Requests提供簡單的方式去發送Multipart-Encoded文件：

>>> url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}

>>> r = requests.post(url, files=files)
>>> r.text
{
   ...
   "files": {
   "file": "<censored...binary...data>"
   },
   ...
}

還可以設定文件名:

>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'))}

>>> r = requests.post(url, files=files)
>>> r.text
{
   ...
   "files": {
   "file": "<censored...binary...data>"
   },
   ...
}

如果你想，你還可以發送字符串作為接受的文件:

>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
>>> r = requests.post(url, files=files)
>>> r.text
{
   ...
   "files": {
   "file": "some,data,to,send\\nanother,row,to,send\\n"
   },
   ...
}

響應的狀態代碼（Response Status Codes）

我們可以查看響應的狀態碼

>>> r = requests.get('http://httpbin.org/get')
>>> r.status_code
200

為了方便參考，Requests還配備了一個內置的狀態碼對象

>>> r.status_code == requests.codes.ok
True

如果我們的請求是一個失敗的請求（非200的status code）,我們可以調用Response.raise_for_status()方法將異常拋出:

>>> bad_r = requests.get('http://httpbin.org/status/404')
>>> bad_r.status_code
404

>>> bad_r.raise_for_status()
Traceback (most recent call last):
  File "requests/models.py", line 832, in raise_for_status
    raise http_error
requests.exceptions.HTTPError: 404 Client Error

但是，如果我們的Response的status_code是200，當我們調用raise_for_status()時：

>>> r.raise_for_status()
None

返回的就是None，All is well.

響應報頭（Response Headers）

我們可以以python字典類型的格式查看服務器返回的響應報頭

>>> r.headers
{
    'status': '200 OK',
    'content-encoding': 'gzip',
    'transfer-encoding': 'chunked',
    'connection': 'close',
    'server': 'nginx/1.0.4',
    'x-runtime': '148ms',
    'etag': '"e1ca502697e5c9317743dc078f67693f"',
    'content-type': 'application/json; charset=utf-8'
}

我們知道dictionary的key值是大小寫敏感的，但這里的dictionary有點特殊，它只是用來保存Http的標頭信息，根據RFC 2616,HTTP標頭是不區分大小寫的。
所以我們可以使用任意大小寫獲取標頭屬性的值：

>>> r.headers['Content-Type']
'application/json; charset=utf-8'

>>> r.headers.get('content-type')
'application/json; charset=utf-8'

如果訪問頭文件不存在的屬性，默認返回一個None:

>>> r.headers['X-Random']
None

Cookies

如果一個response包含Cookies信息，你可以很快速的訪問它們:

>>> url = 'http://httpbin.org/cookies/set/requests-is/awesome'
>>> r = requests.get(url)

>>> r.cookies['requests-is']
'awesome'

你可以設置自己的cookies利用cookies這個參數發送到服務器:

>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

重定向和歷史（Redirection and History）

當使用GET和OPTIONS請求類型時，Requests會幫你自動重定向。
GitHub會重定向所有HTTP請求到HTTPS，我們可以調用response的history方法跟蹤重定向:

>>> r = requests.get('http://github.com')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[<Response [301]>]

調用response.history返回的列表包含了一系列為了完成請求而創建的對象，這個列表按照從最開始的請求到最后的請求進行排序。
如果使用GET和OPTION請求類型，可以設置allow_redirects這個參數來關閉自動重定向：

>>> r = requests.get('http://github.com', allow_redirects=False)
>>> r.status_code
301
>>> r.history
[]

如果使用的是POST, PUT, PATCH, DELETE或者HEAD,你也可以打開重定向功能:

>>> r = requests.post('http://github.com', allow_redirects=True)
>>> r.url
'https://github.com/'
>>> r.history
[<Response [301]>]

超時（Timeouts）

你可以設置timeout參數的值，讓請求在到達指定的時間后停止等待服務器的響應。

>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

注意：timeout的設置只是影響請求的連接過程，不影響下載的response body。

錯誤和異常（Errors and Exceptions）

在發生網絡問題時（如DNS失敗、拒絕連接,等等）Requests會拋出一個ConnectionError的異常。
接收到Http無效的響應時，Requests會拋出一個HTTPError的異常。
如果請求超時了，會拋出一個請求超時的異常。
如果一個請求超過了配置的重定向次數，會拋出TooManyRedirects異常。
所有Requests的異常都顯示的繼承自requests.exceptions.RequestException.

對Requests庫的入門介紹先到這樣，在下一篇日志，我會繼續介紹它的一些高級特性。

（完）

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Requests:Python HTTP Module學習筆記（二）（轉） python錯誤筆記-- ModuleNotFoundError: No module named 'requests' Python requests模塊學習筆記 Python Requests 庫學習筆記 Python Requests-學習筆記(3)-處理json 【python學習遇到的坑】pycharm提示 import requests ModuleNotFoundError: No module named 'requests' 怎么辦？ Python Requests-學習筆記(5)-響應狀態碼 Python Requests-學習筆記(4)-定制請求頭和POST Python Requests-學習筆記(8)-重定向與請求歷史 Python學習筆記——Unicode (轉）