Python-異步之aiohttp

本文轉載自查看原文 2020-03-23 17:20 5915

什么是 aiohttp？一個異步的 HTTP 客戶端\服務端框架，基於 asyncio 的異步模塊。可用於實現異步爬蟲，更快於 requests 的同步爬蟲。
aiohttp 和 requests

requests 版爬蟲

requests 同步方式連續 30 次簡單爬取 http://httpbin.org 網站

import requests
from datetime import datetime


def fetch(url):
    r = requests.get(url)
    print(r.text)

start = datetime.now()

for i in range(30):
    fetch('http://httpbin.org/get')

end = datetime.now()

print("requests版爬蟲花費時間為：")
print(end - start)

從爬取結果可以看出，同步爬取 30 次網站將花費 43 秒左右的時間，耗時非常長。
aiohttp 版爬蟲

使用 aiohttp 和 asyncio 異步方式簡單爬取 30 次網站

import aiohttp
import asyncio
from datetime import datetime


async def fetch(client):
    async with client.get('http://httpbin.org/get') as resp:
        assert resp.status == 200
        return await resp.text()


async def main():
    async with aiohttp.ClientSession() as client:
        html = await fetch(client)
        print(html)

loop = asyncio.get_event_loop()

tasks = []
for i in range(30):
    task = loop.create_task(main())
    tasks.append(task)

start = datetime.now()

loop.run_until_complete(main())

end = datetime.now()

print("aiohttp版爬蟲花費時間為：")
print(end - start)

從爬取時間可以看出，aiohttp 異步爬取網站只用了 0.5 秒左右的時間，比 requests 同步方式快了 80 倍左右，速度非常之快。
同一個 session

aiohttp.ClientSession() 中封裝了一個 session 的連接池，並且在默認情況下支持 keepalives，官方建議在程序中使用單個 ClientSession 對象，而不是像上面示例中的那樣每次連接都創建一個 ClientSession 對象，除非在程序中遇到大量的不同的服務。
將上面的示例修改為：

import aiohttp
import asyncio
from datetime import datetime


async def fetch(client):
    print("打印 ClientSession 對象")
    print(client)
    async with client.get('http://httpbin.org/get') as resp:
        assert resp.status == 200
        return await resp.text()


async def main():
    async with aiohttp.ClientSession() as client:
       tasks = []
       for i in range(30):
           tasks.append(asyncio.create_task(fetch(client)))
       await asyncio.wait(tasks)

loop = asyncio.get_event_loop()

start = datetime.now()

loop.run_until_complete(main())

end = datetime.now()
print("aiohttp版爬蟲花費時間為：")
print(end - start)

示例結果

# 重復30遍
打印 ClientSession 對象
<aiohttp.client.ClientSession object at 0x1094aff98>
aiohttp版爬蟲花費時間為：
0:00:01.778045

從上面爬取的時間可以看出單個 ClientSession 對象比多個 ClientSession 對象多花了 3 倍時間。ClientSession 對象一直是同一個 0x1094aff98。
返回值

Json 串

在上面的示例中使用 response.text() 函數返回爬取到的內容，aiohttp 在處理 Json 返回值的時候，可以直接將字符串轉換為 Json。

async def fetch(client):
    async with client.get('http://httpbin.org/get') as resp:
        return await resp.json()

示例結果

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'Python/3.7 aiohttp/3.6.2'}, 'origin': '49.80.42.33, 49.80.42.33', 'url': 'https://httpbin.org/get'}

當返回的 Json 串不是一個標准的 Json 時，resp.json() 函數可以傳遞一個函數對 json 進行預處理，如：resp.json(replace(a, b))，replace()函數表示 a 替換為 b。
字節流

aiohttp 使用 response.read() 函數處理字節流，使用 with open() 方式保存文件或者圖片

async def fetch(client):
    async with client.get('http://httpbin.org/image/png') as resp:
        return await resp.read()


async def main():
    async with aiohttp.ClientSession() as client:
        image = await fetch(client)
        with open("/Users/xxx/Desktop/image.png", 'wb') as f:
            f.write(image)

response.read() 函數可以傳遞數字參數用於讀取多少個字節，如：response.read(3)讀取前 3 個字節。
參數

aiohttp 可以使用 3 種方式在 URL 地址中傳遞參數

async def fetch(client):
    params = [('a', 1), ('b', 2)]
    async with client.get('http://httpbin.org/get',params=params) as resp:
        return await resp.text()

示例 URL 地址

http://httpbin.org/get?a=1&b=2
async def fetch(client):
    params = {"a": 1,"b": 2}
    async with client.get('http://httpbin.org/get',params=params) as resp:
        return await resp.text()

示例 URL 地址

http://httpbin.org/get?a=1&b=2
async def fetch(client):
    async with client.get('http://httpbin.org/get',params='q=aiohttp+python&a=1') as resp:
        return await resp.text()

請求頭

aiohttp 在自定義請求頭時，類似於向 URL 傳遞參數的方式

async def fetch(client):
    headers = {'content-type': 'application/json', 'User-Agent': 'Python/3.7 aiohttp/3.7.2'}
    async with client.get('http://httpbin.org/get',headers=headers) as resp:
        return await resp.text()

COOKIES

cookies 是整個會話共用的，所以應該在初始化 ClientSession 對象時傳遞

async def fetch(client):
    async with client.get('http://httpbin.org/get') as resp:
        return await resp.text()


async def main():
    cookies = {'cookies': 'this is cookies'}
    async with aiohttp.ClientSession(cookies=cookies) as client:
        html = await fetch(client)
        print(html)

POST 方式

在前面的示例中都是以 GET 方式提交請求，下面用 POST 方式請求

async def fetch(client):
    data = {'a': '1', 'b': '2'}
    async with client.post('http://httpbin.org/post', data = data) as resp:
        return await resp.text()

示例結果

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "a": "1",
    "b": "2"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "7",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "Python/3.7 aiohttp/3.6.2"
  },
  "json": null,
  "origin": "49.80.42.33, 49.80.42.33",
  "url": "https://httpbin.org/post"
}

aiohttp版爬蟲花費時間為：
0:00:00.514402

超時

在請求網站時，有時會遇到超時問題，aiohttp 中使用 timeout 參數設置，單位為秒數，aiohttp 默認超時時間為 5 分鍾

async def fetch(client):
    async with client.get('http://httpbin.org/get', timeout=60) as resp:
        return await resp.text()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python異步Request操作: aiohttp python aiohttp異步實現HTTP請求 python3異步爬蟲 ——aiohttp模板使用 python 之異步模塊 asyncio、aiohttp、gevent Python網絡爬蟲(高性能異步爬蟲實例-aiohttp應用) aiohttp 異步IO庫 python 異步IO-aiohttp與簡單的異步HTTP客戶端/服務器利用aiohttp制作異步爬蟲異步請求庫aiohttp的使用異步網絡模塊之aiohttp的使用(一)

Python-異步之aiohttp

aiohttp 和 requests

requests 版爬蟲

aiohttp 版爬蟲

同一個 session

返回值

Json 串

字節流

參數

請求頭

COOKIES

POST 方式

超時

免責聲明！