v2ex同步更新:https://www.v2ex.com/t/500081
微信公眾號:python學習開發
分析源碼,看大神的代碼是一種學習的好方法,讓我從中學到很多以前不知道的知識,這次打算從大家熟悉的Kenneth Reitz大神的request入手,對該模塊應用的一些技巧進行一次探究。
從get方法入手
我們知道使用requests的get方法傳入url就可以訪問此網站,但是這個過程是怎么做的呢,今天就帶着這個疑問對其進行進一步探究。
打開pycharm,然后創建demo.py
輸入一下代碼即可。
import requests
url="https://www.baidu.com"
req=requests.get(url)
在pycharm中通過ctrl(command)+🖱️左鍵我們可以定位到方法的位置。
我們首先進入api.py文件,看到get方法如下:
def get(url, params=None, **kwargs):
r"""Sends a GET request.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""
kwargs.setdefault('allow_redirects', True)
return request('get', url, params=params, **kwargs)
可以發現該方法就兩句話
先看第一句,kwargs.setdefault('allow_redirects', True),下面我們來說說kwargs在這里的用處
kwargs
kwargs是字典類型,setdefault的作用是給字典鍵名allow_redirects賦值,如果該鍵不存在,賦給其默認值,也就是第二參數True。
用**kwargs可在方法間的傳遞大量參數,不需要自己每次都初始化一個dict用來傳參
下面看一個簡單例子
# -*- coding: utf-8 -*-
# @Time : 2018/10/16 下午10:07
# @Author : cxa
# @File : kwargsDemo.py
# @Software: PyCharm
import requests
def print_text(r, *args, **kwargs):
print(r.text)
# **kwargs 的妙用省去了一堆參數
def foo(url, **kwargs):
data = kwargs.pop('data', dict()) or kwargs.pop('params', dict())
headers = kwargs.pop('headers', {})
print("data", data)
print('headers', headers)
req = requests.get(url, headers=headers, data=data,hooks=dict(response=print_text))
if __name__ == '__main__':
url = "https://www.baidu.com"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'}
kwargs={}
kwargs.setdefault('headers', headers)
foo(url, **kwargs)
foo函數定義了兩個參數一個是固定的url,一個是kwargs,鍵值對類型的參數。
kwags.pop([key],default)
通過pop函數我們可以獲取指定鍵的值,如果不存在會給定默認參數。
然后看第二句,返回一個request對象,我們繼續跟進request,此時看到api.py文件的request方法內容,下面我會挑出我認為重要的部分
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
跟進Session()來到session.py文件,摘取其中第一部分,並跟進default_headers()
def __init__(self):
#: A case-insensitive dictionary of headers to be sent on each
#: :class:`Request <Request>` sent from this
#: :class:`Session <Session>`.
self.headers = default_headers()
跟進default_headers()到了utils.py
def default_headers():
"""
:rtype: requests.structures.CaseInsensitiveDict
"""
return CaseInsensitiveDict({
'User-Agent': default_user_agent(),
'Accept-Encoding': ', '.join(('gzip', 'deflate')),
'Accept': '*/*',
'Connection': 'keep-alive',
})
該方法返回了一個叫做CaseInsensitiveDict的方法,繼續跟進我們來到structures.py
知識點來了,我們對該文件的第一句話進行解析
from .compat import OrderedDict, Mapping, MutableMapping
第一句話的作用我們都知道是從compat模塊中導入OrderedDict, Mapping, MutableMapping模塊,繼續跟進可知這三個模塊來自python的collections庫。
from collections import Callable, Mapping, MutableMapping
from urllib3.packages.ordered_dict import OrderedDict
ok開始分析
OrderedDict
很多人認為python中的字典是無序的,因為它是按照hash來存儲的,但是OrderedDict,實現了對字典對象中元素的排序。但是我在查看官網的時候發現了這句話。。。
Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was implementation detail of CPython from 3.6.
這。。。ok,OrderedDict我覺的可以不會用了。就看其中一個dict沒有的方法吧
from collections import OrderedDict
from collections.abc import MutableMapping
# move_to_end(指定一個key,把對應的key-value移到最后)
dic = OrderedDict()
dic['k1'] = 'v1'
dic['k2'] = 'v2'
dic['k3'] = 'v3'
dic.move_to_end('k1')
print(dic)
print(isinstance(dic, MutableMapping)) #映射類型
下面開始分析結構
# -*- coding: utf-8 -*-
# @Time : 2018/10/16 10:34
# @Author : cxa
# @File : dictMethod.py
# @Software: PyCharm
# -*- coding: utf-8 -*-
"""
requests.structures
~~~~~~~~~~~~~~~~~~~
Data structures that power Requests.
"""
from collections import OrderedDict
from collections.abc import Mapping, MutableMapping
from collections import Iterable
class CaseInsensitiveDict(MutableMapping):
"""A case-insensitive ``dict``-like object.
Implements all methods and operations of
``MutableMapping`` as well as dict's ``copy``. Also
provides ``lower_items``.
All keys are expected to be strings. The structure remembers the
case of the last key to be set, and ``iter(instance)``,
``keys()``, ``items()``, ``iterkeys()``, and ``iteritems()``
will contain case-sensitive keys. However, querying and contains
testing is case insensitive::
cid = CaseInsensitiveDict()
cid['Accept'] = 'application/json'
cid['aCCEPT'] == 'application/json' # True
list(cid) == ['Accept'] # True
For example, ``headers['content-encoding']`` will return the
value of a ``'Content-Encoding'`` response header, regardless
of how the header name was originally stored.
If the constructor, ``.update``, or equality comparison
operations are given keys that have equal ``.lower()``s, the
behavior is undefined.
"""
def __init__(self, data=None, **kwargs):
# 初始化的時候進入,初始化一個 OrderedDict()
self._store = OrderedDict()
if data is None:
data = {}
self.update(data, **kwargs) # 把屬性加入到 self 的__dict__里,也是一個字典操作。
def __setitem__(self, key, value):
# key.lower() 把字符串轉換成小寫
# 這句話在屬性賦值的時候會被調用。實現的無視字母大小寫進行賦值
self._store[key.lower()] = (key, value)
# setattr(self,key.lower(),(key, value))
def __getitem__(self, key):
return self._store[key.lower()][1]
def __delitem__(self, key):
del self._store[key.lower()]
def __iter__(self):
return (casedkey for casedkey, mappedvalue in self._store.values()) #調用父類的__iter__
def __len__(self):
return len(self._store)
def lower_items(self):
"""Like iteritems(), but with all lowercase keys."""
return (
(lowerkey, keyval[1])
for (lowerkey, keyval)
in self._store.items()
)
def __eq__(self, other):
if isinstance(other, Mapping):
other = CaseInsensitiveDict(other)
else:
return NotImplemented
# Compare insensitively
return dict(self.lower_items()) == dict(other.lower_items())
# Copy is required
def copy(self):
return CaseInsensitiveDict(self._store.values())
def __repr__(self):
# print 的時候會進入
print(isinstance(self.items(), Iterable)) # 輸入可迭代對象,此時
##內部實際
# dict(iterable)
# d = {}
# for k, v in iterable: #會調用__iter__
# d[k] = v
return str(dict(self.items()))
if __name__ == '__main__':
dic = CaseInsensitiveDict()
dic["name"] = "lisa"
print(dic)
對於魔法函數的幾個用法參考 python進階之魔法函數,其他的內容看注釋。
回到session.py我們定位到361行也就是self.hooks的那一行
hooks
requests中有一個鈎子函數,那就是hooks其作用類似一個回調函數,會在成功請求之后再去執行這個鈎子函數。上面的kwargs部分的時候用到過。下面我們就跟進看看hooks是怎實現這個回調功能的。
首先self.hooks = default_hooks(),跟進發現有個默認的hooks
定位到了hooks.py的17行
HOOKS = ['response']
def default_hooks():
return dict((event, []) for event in HOOKS)
可以得知 self.hooks={"response":[]}也就是其初始值。
接下來models.py 233行我們找到
self.hooks = default_hooks()
for (k, v) in list(hooks.items()):
self.register_hook(event=k, hook=v)
對於前兩行我們可以總結一個小例子:
HOOKS=["res"]
def default_hooks():
return dict((event, []) for event in HOOKS)
hooks=default_hooks()
for (k, v) in list(hooks.items()):
print(k,v)
輸出
res []
我們繼續看上面的for循環內的 self.register_hook(event=k, hook=v)
找到register_hook方法
def register_hook(self, event, hook):
"""Properly register a hook."""
if event not in self.hooks: #如果該鍵不在字典
raise ValueError('Unsupported event specified, with event name "%s"' % (event))#拋出異常
if isinstance(hook, Callable):#是否可調用,hook是個函數進入條件
self.hooks[event].append(hook)
elif hasattr(hook, '__iter__'):
self.hooks[event].extend(h for h in hook if isinstance(h, Callable))
到此處完成了self.hooks["response"]=[function<print_text style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px;">].
接下來
def merge_hooks(request_hooks, session_hooks, dict_class=OrderedDict):
"""request_hooks=上面的self.hooks
session_hooks={'response': []}
"""
if session_hooks is None or session_hooks.get('response') == []:
return request_hooks #返回
if request_hooks is None or request_hooks.get('response') == []:
return session_hooks
return merge_setting(request_hooks, session_hooks, dict_class)
