aiohttp


aiohttp是python3的一個異步模塊,分為服務器端和客戶端。廖雪峰的python3教程中,講的是服務器端的使用方法。均益這里主要講的是客戶端的方法,用來寫爬蟲。使用異步協程的方式寫爬蟲,能提高程序的運行效率。

1、安裝

pip install aiohttp

2、單一請求方法

import aiohttp
import asyncio   async def fetch(session, url): async with session.get(url) as response: return await response.text()   async def main(url): async with aiohttp.ClientSession() as session: html = await fetch(session, url) print(html) url = 'http://junyiseo.com' loop = asyncio.get_event_loop() loop.run_until_complete(main(url))

3、多url請求方法

import aiohttp
import asyncio   async def fetch(session, url): async with session.get(url) as response: return await response.text()   async def main(url): async with aiohttp.ClientSession() as session: html = await fetch(session, url) print(html)     loop = asyncio.get_event_loop()   # 生成多個請求方法 url = "http://junyiseo.com" tasks = [main(url), main(url)] loop.run_until_complete(asyncio.wait(tasks)) loop.close()

4、其他的請求方式

上面的代碼中,我們創建了一個 ClientSession 對象命名為session,然后通過session的get方法得到一個 ClientResponse 對象,命名為resp,get方法中傳入了一個必須的參數url,就是要獲得源碼的http url。至此便通過協程完成了一個異步IO的get請求。
aiohttp也支持其他的請求方式

session.post('http://httpbin.org/post', data=b'data') session.put('http://httpbin.org/put', data=b'data') session.delete('http://httpbin.org/delete') session.head('http://httpbin.org/get') session.options('http://httpbin.org/get') session.patch('http://httpbin.org/patch', data=b'data')

5、請求方法中攜帶參數

GET方法帶參數

params = {'key1': 'value1', 'key2': 'value2'} async with session.get('http://httpbin.org/get', params=params) as resp: expect = 'http://httpbin.org/get?key2=value2&key1=value1' assert str(resp.url) == expect

POST方法帶參數

payload = {'key1': 'value1', 'key2': 'value2'} async with session.post('http://httpbin.org/post', data=payload) as resp: print(await resp.text())

6、獲取響應內容

resp.status 是http狀態碼,
resp.text() 是網頁內容

async with session.get('https://api.github.com/events') as resp: print(resp.status) print(await resp.text())

gzip和deflate轉換編碼已經為你自動解碼。

7、JSON請求處理

async with aiohttp.ClientSession() as session: async with session.post(url, json={'test': 'object'})

返回json數據的處理

async with session.get('https://api.github.com/events') as resp: print(await resp.json())

8、以字節流的方式讀取文件,可以用來下載

async with session.get('https://api.github.com/events') as resp: await resp.content.read(10) #讀取前10個字節

下載保存文件

with open(filename, 'wb') as fd: while True: chunk = await resp.content.read(chunk_size) if not chunk: break fd.write(chunk)

9、上傳文件

url = 'http://httpbin.org/post' files = {'file': open('report.xls', 'rb')}   await session.post(url, data=files)

可以設置好文件名和content-type:

url = 'http://httpbin.org/post' data = FormData() data.add_field('file', open('report.xls', 'rb'), filename='report.xls', content_type='application/vnd.ms-excel')   await session.post(url, data=data)

10、超時處理

默認的IO操作都有5分鍾的響應時間 我們可以通過 timeout 進行重寫,如果 timeout=None 或者 timeout=0 將不進行超時檢查,也就是不限時長。

async with session.get('https://github.com', timeout=60) as r: ...

11、自定義請求頭

url = 'http://example.com/image' payload = b'GIF89a\x01\x00\x01\x00\x00\xff\x00,\x00\x00' b'\x00\x00\x01\x00\x01\x00\x00\x02\x00;' headers = {'content-type': 'image/gif'}   await session.post(url, data=payload, headers=headers)

設置session的請求頭

headers={"Authorization": "Basic bG9naW46cGFzcw=="} async with aiohttp.ClientSession(headers=headers) as session: async with session.get("http://httpbin.org/headers") as r: json_body = await r.json() assert json_body['headers']['Authorization'] == \ 'Basic bG9naW46cGFzcw=='

12、自定義cookie

url = 'http://httpbin.org/cookies' cookies = {'cookies_are': 'working'} async with ClientSession(cookies=cookies) as session: async with session.get(url) as resp: assert await resp.json() == { "cookies": {"cookies_are": "working"}}

在多個請求中共享cookie

async with aiohttp.ClientSession() as session: await session.get( 'http://httpbin.org/cookies/set?my_cookie=my_value') filtered = session.cookie_jar.filter_cookies( 'http://httpbin.org') assert filtered['my_cookie'].value == 'my_value' async with session.get('http://httpbin.org/cookies') as r: json_body = await r.json() assert json_body['cookies']['my_cookie'] == 'my_value'

13、限制同時請求數量

limit默認是100,limit=0的時候是無限制

conn = aiohttp.TCPConnector(limit=30)

14、SSL加密請求

有的請求需要驗證加密證書,可以設置ssl=False,取消驗證

r = await session.get('https://example.com', ssl=False)

加入證書

sslcontext = ssl.create_default_context( cafile='/path/to/ca-bundle.crt') r = await session.get('https://example.com', ssl=sslcontext)

15、代理請求

async with aiohttp.ClientSession() as session: async with session.get("http://python.org", proxy="http://proxy.com") as resp: print(resp.status)

https://www.mzihen.com/solution-to-shadowsocks-error-port-already-in-use/

代理認證

async with aiohttp.ClientSession() as session: proxy_auth = aiohttp.BasicAuth('user', 'pass') async with session.get("http://python.org", proxy="http://proxy.com", proxy_auth=proxy_auth) as resp: print(resp.status)

或者通過URL認證

session.get("http://python.org", proxy="http://user:pass@some.proxy.com")

16、優雅的關閉程序

沒有ssl的情況,加入這個語句關閉await asyncio.sleep(0)

async def read_website(): async with aiohttp.ClientSession() as session: async with session.get('http://example.org/') as resp: await resp.read()   loop = asyncio.get_event_loop() loop.run_until_complete(read_website()) # Zero-sleep to allow underlying connections to close loop.run_until_complete(asyncio.sleep(0)) loop.close()

如果是ssl請求,在關閉前需要等待一會

loop.run_until_complete(asyncio.sleep(0.250)) loop.close()

17、小結

本文從官方翻譯而來,有問題可以留言

官方文檔
http://aiohttp.readthedocs.io/en/stable/


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM