requests
是一個很實用的Python HTTP客戶端庫,編寫爬蟲和測試服務器響應數據時經常會用到。可以說,Requests 完全滿足如今網絡的需求
本文全部來源於官方文檔 http://docs.python-requests.org/en/master/
安裝方式一般采用$ pip install requests。其它安裝方式參考官方文檔
HTTP - requests
import requests
GET請求
r = requests.get('http://httpbin.org/get')
傳參
>>> payload = {'key1': 'value1', 'key2': 'value2', 'key3': None}
>>> r = requests.get('http://httpbin.org/get', params=payload)
http://httpbin.org/get?key2=value2&key1=value1
Note that any dictionary key whose value is None will not be added to the URL's query string.
參數也可以傳遞列表
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3
r.text 返回headers中的編碼解析的結果,可以通過r.encoding = 'gbk'來變更解碼方式
r.content返回二進制結果
r.json()返回JSON格式,可能拋出異常
r.status_code
r.raw返回原始socket respons,需要加參數stream=True
>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
將結果保存到文件,利用r.iter_content()
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
傳遞headers
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)
傳遞cookies
>>> url = 'http://httpbin.org/cookies'
>>> r = requests.get(url, cookies=dict(cookies_are='working'))
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
POST請求
傳遞表單
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
通常,你想要發送一些編碼為表單形式的數據—非常像一個HTML表單。 要實現這個,只需簡單地傳遞一個字典給 data 參數。你的數據字典 在發出請求時會自動編碼為表單形式:
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
...
"form": {
"key2": "value2",
"key1": "value1"
},
...
}
很多時候你想要發送的數據並非編碼為表單形式的。如果你傳遞一個 string 而不是一個dict ,那么數據會被直接發布出去。
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, data=json.dumps(payload))
或者
>>> r = requests.post(url, json=payload)
傳遞文件
url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
配置files,filename, content_type and headers
files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
響應
r.status_code
r.heards
r.cookies
跳轉
By default Requests will perform location redirection for all verbs except HEAD.
>>> r = requests.get('http://httpbin.org/cookies/set?k2=v2&k1=v1')
>>> r.url
'http://httpbin.org/cookies'
>>> r.status_code
200
>>> r.history
[<Response [302]>]
If you're using HEAD, you can enable redirection as well:
r=requests.head('http://httpbin.org/cookies/set?k2=v2&k1=v1',allow_redirects=True)
You can tell Requests to stop waiting for a response after a given number of seconds with the timeoutparameter:
requests.get('http://github.com', timeout=0.001)
高級特性
來自 <http://docs.python-requests.org/en/master/user/advanced/#advanced>
session,自動保存cookies,可以設置請求參數,下次請求自動帶上請求參數
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
session可以用來提供默認數據,函數參數級別的數據會和session級別的數據合並,如果key重復,函數參數級別的數據將覆蓋session級別的數據。如果想取消session的某個參數,可以在傳遞一個相同key,value為None的dict
s = requests.Session()
s.auth = ('user', 'pass') #權限認證
s.headers.update({'x-test': 'true'})
# both 'x-test' and 'x-test2' are sent
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})
函數參數中的數據只會使用一次,並不會保存到session中
如:cookies僅本次有效
r = s.get('http://httpbin.org/cookies', cookies={'from-my': 'browser'})
session也可以自動關閉
with requests.Session() as s:
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
響應結果不僅包含響應的全部信息,也包含請求信息
r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')
r.headers
r.request.headers
SSL證書驗證
Requests可以為HTTPS請求驗證SSL證書,就像web瀏覽器一樣。要想檢查某個主機的SSL證書,你可以使用 verify 參數:
>>> requests.get('https://kennethreitz.com', verify=True)
requests.exceptions.SSLError: hostname 'kennethreitz.com' doesn't match either of '*.herokuapp.com', 'herokuapp.com'
在該域名上我沒有設置SSL,所以失敗了。但Github設置了SSL:
>>> requests.get('https://github.com', verify=True)
<Response [200]>
對於私有證書,你也可以傳遞一個CA_BUNDLE文件的路徑給 verify 。你也可以設置REQUEST_CA_BUNDLE 環境變量。
>>> requests.get('https://github.com', verify='/path/to/certfile')
如果你將 verify 設置為False,Requests也能忽略對SSL證書的驗證。
>>> requests.get('https://kennethreitz.com', verify=False)
<Response [200]>
默認情況下, verify 是設置為True的。選項 verify 僅應用於主機證書。
你也可以指定一個本地證書用作客戶端證書,可以是單個文件(包含密鑰和證書)或一個包含兩個文件路徑的元組:
>>> requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
<Response [200]>
響應體內容工作流
默認情況下,當你進行網絡請求后,響應體會立即被下載。你可以通過 stream 參數覆蓋這個行為,推遲下載響應體直到訪問 Response.content 屬性:
tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
r = requests.get(tarball_url, stream=True)
此時僅有響應頭被下載下來了,連接保持打開狀態,因此允許我們根據條件獲取內容:
if int(r.headers['content-length']) < TOO_LONG:
content = r.content
...
如果設置stream為True,請求連接不會被關閉,除非讀取所有數據或者調用Response.close。
可以使用contextlib.closing來自動關閉連接:
import requests
from contextlib
import closing
tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
file = r'D:\Documents\WorkSpace\Python\Test\Python34Test\test.tar.gz'
with closing(requests.get(tarball_url, stream=True)) as r:
with open(file, 'wb') as f:
for data in r.iter_content(1024):
f.write(data)
Keep-Alive
來自 <http://docs.python-requests.org/en/master/user/advanced/>
同一會話內你發出的任何請求都會自動復用恰當的連接!
注意:只有所有的響應體數據被讀取完畢連接才會被釋放為連接池;所以確保將 stream設置為 False 或讀取 Response 對象的 content 屬性。
流式上傳
Requests支持流式上傳,這允許你發送大的數據流或文件而無需先把它們讀入內存。要使用流式上傳,僅需為你的請求體提供一個類文件對象即可:
讀取文件請使用字節的方式,這樣Requests會生成正確的Content-Length
with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)
分塊傳輸編碼
對於出去和進來的請求,Requests也支持分塊傳輸編碼。要發送一個塊編碼的請求,僅需為你的請求體提供一個生成器
注意生成器輸出應該為bytes
def gen():
yield b'hi'
yield b'there'
requests.post('http://some.url/chunked', data=gen())
For chunked encoded responses, it's best to iterate over the data using Response.iter_content(). In an ideal situation you'll have set stream=True on the request, in which case you can iterate chunk-by-chunk by calling iter_content with a chunk size parameter of None. If you want to set a maximum size of the chunk, you can set a chunk size parameter to any integer.
POST Multiple Multipart-Encoded Files
來自 <http://docs.python-requests.org/en/master/user/advanced/>
<input type="file" name="images" multiple="true" required="true"/>
To do that, just set files to a list of tuples of (form_field_name, file_info):
>>> url = 'http://httpbin.org/post'
>>> multiple_files = [
('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
>>> r = requests.post(url, files=multiple_files)
>>> r.text
{
...
'files': {'images': ' ....'}
'Content-Type': 'multipart/form-data; boundary=3131623adb2043caaeb5538cc7aa0b3a',
...
}
Custom Authentication
Requests allows you to use specify your own authentication mechanism.
Any callable which is passed as the auth argument to a request method will have the opportunity to modify the request before it is dispatched.
Authentication implementations are subclasses of requests.auth.AuthBase, and are easy to define. Requests provides two common authentication scheme implementations in requests.auth:HTTPBasicAuth and HTTPDigestAuth.
Let's pretend that we have a web service that will only respond if the X-Pizza header is set to a password value. Unlikely, but just go with it.
from requests.auth import AuthBase
class PizzaAuth(AuthBase):
"""Attaches HTTP Pizza Authentication to the given Request object."""
def __init__(self, username):
# setup any auth-related data here
self.username = username
def __call__(self, r):
# modify and return the request
r.headers['X-Pizza'] = self.username
return r
Then, we can make a request using our Pizza Auth:
>>> requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth'))
<Response [200]>
來自 <http://docs.python-requests.org/en/master/user/advanced/>
流式請求
r = requests.get('http://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():
代理
If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
To use HTTP Basic Auth with your proxy, use the http://user:password@host/ syntax:
proxies = {'http': 'http://user:pass@10.10.1.10:3128/'}
超時
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.
r = requests.get('https://github.com', timeout=None)
來自 <http://docs.python-requests.org/en/master/user/advanced/>