requests模塊
使用requests可以模擬瀏覽器的請求,比起之前用到的urllib,requests模塊的api更加便捷(本質就是封裝了urllib3)
注意:requests庫發送請求將網頁內容下載下來以后,並不會執行js代碼,這需要我們自己分析目標站點然后發起新的request請求
官方文檔:http://cn.python-requests.org/zh_CN/latest/
安裝:pip3 install requests
requests模塊的各種請求方式
源碼構成如下

# 以上方法均是在此方法的基礎上構建
requests.request(method, url, **kwargs)
其中最常用的請求方式就是post和get請求,泵智商,post和get就是封裝了request請求的請求方式
>>> r = requests.get('https://api.github.com/events')
相當於requests,request(method='get', 'https://api.github.com/events')
>>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})
相當於requests,request(method='post', 'https://api.github.com/events', data = {'key':'value'})
requests,request方法詳解
request()源碼
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
to add for the file.
:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How many seconds to wait for the server to send data
before giving up, as a float, or a :ref:`(connect timeout, read
timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param stream: (optional) if ``False``, the response content will be immediately downloaded.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
:return: :class:`Response <Response>` object
:rtype: requests.Response
Usage::
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
"""
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
下面對源碼中的各個屬性進行分析
method和url
指名請求方式和請求路徑
requests.request(method='get', url='http://127.0.0.1:8000/test/') requests.request(method='post', url='http://127.0.0.1:8000/test/')
params
requests模塊發送請求有data、json、params三種攜帶參數的方法。
params在get請求中使用,data、json在post請求中使用。
params可以接收的參數:
- 可以是字典 - 可以是字符串 字典字符串都會被自動編碼發送到url - 可以是字節(必須是ascii編碼以內)
接收字典字符串都會被自動編碼發送到url,如下
import requests
wd='egon老師'
pn=1
response=requests.get('https://www.baidu.com/s',
params={
'wd':wd,
'pn':pn
},
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36',
})
print(response.url)
# 輸出為:https://www.baidu.com/s?wd=egon%E8%80%81%E5%B8%88&pn=1
# 可見url已被自動編碼
上面代碼相當於如下代碼,params編碼轉換本質上是用urlencode
import requests
from urllib.parse import urlencode
wd='egon老師'
encode_res=urlencode({'k':wd},encoding='utf-8')
keyword=encode_res.split('=')[1]
print(keyword)
# 然后拼接成url
url='https://www.baidu.com/s?wd=%s&pn=1' %keyword
response=requests.get(url,
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36',
})
print(response.url)
# 輸出為:https://www.baidu.com/s?wd=egon%E8%80%81%E5%B8%88&pn=1
還有一點注意的就是接收字節數據時,不能傳非ASCII碼外的字符,如下就是錯誤的
import requests
# re = requests.request(method='get',
# url='http://127.0.0.1:8000/test/',
# params=bytes("k1=v1&k2=水電費&k3=v3&k3=vv3", encoding='utf8'))
data
requests模塊發送請求有data、json、params三種攜帶參數的方法。params在get請求中使用,data、json在post請求中使用。
data可以接收的參數為:字典,字符串,字節,文件對象,data和json兩者的區別在於data的請求體為name=alex&age=18格式而json請求體為‘{'k1': 'v1', 'k2': '水電費'}’(字符串)
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1': 'v1', 'k2': '水電費'})
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data="k1=v1; k2=v2; k3=v3; k3=v4"
)
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data="k1=v1;k2=v2;k3=v3;k3=v4",
headers={'Content-Type': 'application/x-www-form-urlencoded'}
)
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data=open('data_file.py', mode='r', encoding='utf-8'), # 文件內容是:k1=v1;k2=v2;k3=v3;k3=v4
headers={'Content-Type': 'application/x-www-form-urlencoded'}
)
json
將json中對應的數據進行序列化成一個字符串,json.dumps(...)
然后發送到服務器端的body中,並且Content-Type是 {'Content-Type': 'application/json'}
標志:payload
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1': 'v1', 'k2': '水電費'})
headers
發送請求頭到服務器
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1': 'v1', 'k2': '水電費'},
headers={'Content-Type': 'application/x-www-form-urlencoded'}
)
cookies
# 發送Cookie到服務器端
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1': 'v1', 'k2': 'v2'},
cookies={'cook1': 'value1'},
)
# 也可以使用CookieJar(字典形式就是在此基礎上封裝)
from http.cookiejar import CookieJar
from http.cookiejar import Cookie
obj = CookieJar()
obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,
discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,
port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)
)
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1': 'v1', 'k2': 'v2'},
cookies=obj)
files
發送文件
file_dict = {
'f1': open('readme', 'rb')
}
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
files=file_dict)
發送文件,定制文件名
file_dict = {
'f1': ('test.txt', open('readme', 'rb'))
}
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
files=file_dict)
發送文件,定制文件名
file_dict = {
'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")
}
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
files=file_dict)
發送文件,定制文件名
file_dict = {
'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})
}
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
files=file_dict)
auth認證
解決瀏覽器的自帶認證問題
認證設置:登陸網站是,彈出一個框,要求你輸入用戶名密碼(與alter很類似),此時是無法獲取html的,但本質原理是拼接成請求頭發送
r.headers['Authorization'] = _basic_auth_str(self.username, self.password)
一般的網站都不用默認的加密方式,都是自己寫,那么我們就需要按照網站的加密方式,自己寫一個類似於_basic_auth_str的方法
得到加密字符串后添加到請求頭:r.headers['Authorization'] =func('.....')

HTTPBasicAuth實際是向瀏覽器發一個帶有Authorization:.................的請求
HTTPBasicAuth
from requests.auth import HTTPBasicAuth, HTTPDigestAuth
ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))
print(ret.text)
auth別的使用方式
# ret = requests.get('http://192.168.1.1',
# auth=HTTPBasicAuth('admin', 'admin'))
# ret.encoding = 'gbk'
# print(ret.text)
# ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))
# print(ret)
timeout
兩種超時:float or tuple
timeout=0.1 #代表接收數據的超時時間
timeout=(0.1,0.2)#0.1代表鏈接超時 0.2代表接收數據的超時時間
import requests
respone=requests.get('https://www.baidu.com',
timeout=0.0001)
redirects
ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)
print(ret.text)
proxies
代理設置
# 根據協議來確定發送請求時候的ip地址
proxies = {
"http": "61.172.249.96:80",
"https": "http://61.185.219.126:3128",
}
# 根據接收請求的地址來確定用什么地址發送
proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}
ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)
print(ret.headers)
from requests.auth import HTTPProxyAuth
proxyDict = {
'http': '77.75.105.165',
'https': '77.75.105.165'
}
auth = HTTPProxyAuth('username', 'mypassword')
r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)
print(r.text)
#支持socks代理,安裝:pip install requests[socks]
import requests
proxies = {
'http': 'socks5://user:pass@host:port',
'https': 'socks5://user:pass@host:port'
}
respone=requests.get('https://www.12306.cn',
proxies=proxies)
print(respone.status_code)
stream
ret = requests.get('http://127.0.0.1:8000/test/', stream=True)
print(ret.content)
ret.close()
# from contextlib import closing
# with closing(requests.get('http://httpbin.org/get', stream=True)) as r:
# # 在此處理響應。
# for i in r.iter_content():
# print(i)
session
import requests
session = requests.Session()
### 1、首先登陸任何頁面,獲取cookie
i1 = session.get(url="http://dig.chouti.com/help/service")
### 2、用戶登陸,攜帶上一次的cookie,后台對cookie中的 gpsd 進行授權
i2 = session.post(
url="http://dig.chouti.com/login",
data={
'phone': "8615131255089",
'password': "xxxxxx",
'oneMonth': ""
}
)
i3 = session.post(
url="http://dig.chouti.com/link/vote?linksId=8589623",
)
print(i3.text)
編碼問題
import requests
response=requests.get('http://www.autohome.com/news')
# response.encoding='gbk' #汽車之家網站返回的頁面內容為gb2312編碼的,而requests的默認編碼為ISO-8859-1,如果不設置成gbk則中文亂碼
print(response.text)
1
