grequests模塊的使用


使用場景:1) 爬蟲設置ip代理池時驗證ip是否有效 2)進行壓測時,進行批量請求等等場景
grequests 利用 requests和gevent庫,做了一個簡單封裝,使用起來非常方便。
grequests.map(requests, stream=False, size=None, exception_handler=None, gtimeout=None)
在這里插入圖片描述另外,由於grequests底層使用的是requests,因此它支持
GET,OPTIONS, HEAD, POST, PUT, DELETE 等各種http method
所以以下的任務請求都是支持的
grequests.post(url, json={“name”:“zhangsan”})
grequests.delete(url)
代碼如下:

import grequests

urls = [
    'http://www.baidu.com',
    'http://www.qq.com',
    'http://www.163.com',
    'http://www.zhihu.com',
    'http://www.toutiao.com',
    'http://www.douban.com'
]
rs = (grequests.get(u) for u in urls)
print(grequests.map(rs))   # [<Response [200]>, None, <Response [200]>, None, None, <Response [418]>]
def exception_handler(request, exception):
    print("Request failed")
reqs = [
    grequests.get('http://httpbin.org/delay/1', timeout=0.001),
    grequests.get('http://fakedomain/'),
    grequests.get('http://httpbin.org/status/500')
]
print(grequests.map(reqs, exception_handler=exception_handler))

實際操作中,也可以自定義返回的結果
修改grequests源碼文件:
例如:
新增extract_item() 函數合修改map()函數

def extract_item(request):
    """ 提取request的內容 :param request: :return: """
    item = dict()
    item["url"] = request.url
    item["text"] = request.response.text or ""
    item["status_code"] = request.response.status_code or 0
    return item

def map(requests, stream=False, size=None, exception_handler=None, gtimeout=None):
    """Concurrently converts a list of Requests to Responses. :param requests: a collection of Request objects. :param stream: If True, the content will not be downloaded immediately. :param size: Specifies the number of requests to make at a time. If None, no throttling occurs. :param exception_handler: Callback function, called when exception occured. Params: Request, Exception :param gtimeout: Gevent joinall timeout in seconds. (Note: unrelated to requests timeout) """
    requests = list(requests)
    pool = Pool(size) if size else None
    jobs = [send(r, pool, stream=stream) for r in requests]
    gevent.joinall(jobs, timeout=gtimeout)
    ret = []
    for request in requests:

        if request.response is not None:
            ret.append(extract_item(request))
        elif exception_handler and hasattr(request, 'exception'):
            ret.append(exception_handler(request, request.exception))
        else:
            ret.append(None)

    yield ret

可以直接調用:

import grequests

urls = [
    'http://www.baidu.com',
    'http://www.qq.com',
    'http://www.163.com',
    'http://www.zhihu.com',
    'http://www.toutiao.com',
    'http://www.douban.com'
]
rs = (grequests.get(u) for u in urls)
response_list = grequests.map(rs, gtimeout=10)
for response in next(response_list):
    print(response)

支持事件鈎子
def print_url(r, *args, **kwargs):
print(r.url)

url = “http://www.baidu.com”
res = requests.get(url, hooks={“response”: print_url})
tasks = []
req = grequests.get(url, callback=print_url)
tasks.append(req)
ress = grequests.map(tasks)
print(ress)

在這里插入圖片描述


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM