aiohttp中ClientSession使用注意事項

本文轉載自查看原文 2020-10-12 21:35 5109 python/ aiohttp

最近在研究協程，想寫個協程實現的爬蟲，選用aiohttp，對aiohttp中 ClientSession使用有些不解,然而中文資料有點少，大多是寫怎么用就沒了，不是很詳細，就直接看英文官網了。

aiohttp可用作客戶端與服務端，寫爬蟲的話用客戶端即可，所以本文只關於aiohttp的客戶端使用(發請求)，並且需要一點協程的知識才能看懂。

如果想要研究aiohttp的話推薦直接看英文官網，寫的很通俗易懂，就算不大懂英文，直接翻譯也能看懂七八成了。

以下參考自https://docs.aiohttp.org/en/stable/，如有紕漏，歡迎斧正。

簡單請求

如果只發出簡單的請求(如只有一次請求，無需cookie，SSL，等)，可用如下方法。

但其實吧很少用，因為一般爬蟲中用協程都是要爬取大量頁面，可能會使得aiohttp報Unclosed client session的錯誤。這種情況官方是建議用ClientSession(連接池，見下文)的，性能也有一定的提高。

import aiohttp

async def fetch():
    async with aiohttp.request('GET',
            'http://python.org/') as resp:
        assert resp.status == 200
        print(await resp.text())
#將協程放入時間循環        
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch())

使用連接池請求

一般情況下使用如下示例,由官網摘抄。

import aiohttp
import asyncio

#傳入client使用
async def fetch(client,url):
    async with client.get(url) as resp:
        assert resp.status == 200
        return await resp.text()

async def main():
    async with aiohttp.ClientSession() as client:
        html = await fetch(client,url)
        print(html)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

是不是感覺有點繞呢，其實平時使用是不必這樣將fetch函數抽象出去，可以簡單寫成下面的簡潔示例。

import aiohttp
import asyncio
async def main():
    async with aiohttp.ClientSession() as client:
        async with aiohttp.request('GET',
                'http://python.org/') as resp:
            assert resp.status == 200
            print(await resp.text())

發現有什么不同沒有，官網的fetch函數抽象出去后，把ClientSession的一個實例作為參數了。所以在with代碼塊中使用ClientSession實例的情況下，這兩者是等同的(我認為，因為兩者都是用的都是with代碼塊中創建的實例)。

連接池重用

而其實官網這段代碼是在ClientSession的參考處摘抄的，所以官方這樣寫我認為只是在提醒要注意ClientSession的用法。那么ClientSession有啥得注意的呢

Session 封裝了一個連接池（連接器實例），並且默認情況下支持keepalive。除非在應用程序的生存期內連接到大量未知的不同服務器，否則建議您在應用程序的生存期內使用單個會話以受益於連接池。

不要為每個請求創建Session 。每個應用程序很可能需要一個會話，以完全執行所有請求。

更復雜的情況可能需要在每個站點上進行一次會話，例如，一個會話用於Github，另一個會話用於Facebook API。無論如何，為每個請求建立會話是一個非常糟糕的主意。

會話內部包含一個連接池。連接重用和保持活動狀態（默認情況下均處於啟用狀態）可能會提高整體性能。

以上這幾段話由官網翻譯而來。這幾段話都是說，如無必要，只用一個ClientSession實例即可。

但我在很多資料看到的是像如下這樣用的呀

async def fetch(url):
    async with aiohttp.ClientSession() as client:
        async with aiohttp.request('GET',
                url) as resp:
            assert resp.status == 200
            print(await resp.text())

這不明顯沒請求一次就實例化一個ClientSession嘛，並沒有重用ClientSession啊。那應該咋辦呢，然而官網並沒有舉出重用ClientSession的示例(我也是服了，你這么濃墨重彩說道只需一個session，倒是給個示例啊)。

那只得繼續找找資料。然而國內資料不多，只能上github和stackoverflow看看。看了半天也沒個定論，主要是兩個方法。

在with代碼塊中用一個session完成所有請求

下面是我寫的示例

async def fetch(client,url):
    async with client.get(url) as resp:
        assert resp.status == 200
        text = await resp.text()
        return len(text)

#urls是包含多個url的列表
async def fetch_all(urls):
    async with aiohttp.ClientSession() as client:
        return await asyncio.gather(*[fetch(client,url) for url in urls])
    
urls = ['http://python.org/' for i in range(3)]
loop=asyncio.get_event_loop()
results = loop.run_until_complete(fetch_all(urls))
print(results)
print(type(results))

手動創建session，不用with

該方法可以讓你獲取一個session實例而不僅局限於with代碼塊中，可以在后續代碼中繼續使用該session。

async def fetch(client,url):
    async with client.get(url) as resp:
        assert resp.status == 200
        text = await resp.text()
        return len(text)

async def fetch_all_manual(urls,client):
    return await asyncio.gather(*[fetch(client, url) for url in urls])

urls = ['http://python.org/' for i in range(3)]
loop=asyncio.get_event_loop()
client = aiohttp.ClientSession()
results = loop.run_until_complete(fetch_all_manual(urls,client))
#要手動關閉自己創建的ClientSession，並且client.close()是個協程，得用事件循環關閉
loop.run_until_complete(client.close())
#在關閉loop之前要給aiohttp一點時間關閉ClientSession
loop.run_until_complete(asyncio.sleep(3))
loop.close()
print(results)
print(type(results))

此處着重說明以下該方法一些相關事項

手動創建ClientSession要手動關閉自己創建的ClientSession，並且client.close()是個協程，得用事件循環關閉。
在關閉loop之前要給aiohttp一點時間關閉ClientSession

如果無上述步驟會報Unclosed client session的錯誤，也即ClientSession沒有關閉

但就算你遵循了以上兩個事項，如此運行程序會報以下warning，雖然不會影響程序正常進行

DeprecationWarning: The object should be created from async function
  client = aiohttp.ClientSession()

這說的是client = aiohttp.ClientSession() 這行代碼應該在異步函數中執行。如果你無法忍受可以在定義個用異步方法用作創建session

async def create_session():
    return aiohttp.ClientSession()

session = asyncio.get_event_loop().run_until_complete(create_session())

ClientSession 部分重要參數

下面是ClientSession的所有參數，這里用的比較多的是connector,headers,cookies。headers和cookies寫過爬蟲的可能都認識了，這里只談一下connector。

connector是aiohttp客戶端API的傳輸工具。並發量控制，ssl證書驗證，都可通過connector設置，然后傳入ClientSession。

標准connector有兩種：

TCPConnector用於常規TCP套接字（同時支持HTTP和 HTTPS方案）(絕大部分情況使用這種)。
UnixConnector 用於通過UNIX套接字進行連接（主要用於測試）。

所有連接器類都應繼承自BaseConnector。

使用可以按以下實例

#創建一個TCPConnector
conn=aiohttp.TCPConnector(verify_ssl=False)
#作為參數傳入ClientSession
async with aiohttp.ClientSession(connector=conn) as session:

TCPConnector比較重要的參數有

verify_ssl（bool）–布爾值，對HTTPS請求執行SSL證書驗證（默認情況下啟用）。當要跳過對具有無效證書的站點的驗證時可設置為False。
limit（int）–整型，同時連接的總數。如果為limit為 None則connector沒有限制（默認值：100）。
limit_per_host（int）–限制同時連接到同一端點的總數。如果(host, port, is_ssl)三者相同，則端點是相同的。如果為limit=0，則connector沒有限制（默認值：0）。

如果爬蟲用上協程，請求速度是非常快的，很可能會對別人服務器造成拒絕服務的攻擊，所以平常使用若無需求，最好還是不要設置limit為0。

限制並發量的另一個做法(使用Semaphore)

使用Semaphore直接限制發送請求。此處只寫用法，作拋磚引玉之用。也很容易用，在fetch_all_manual函數里加上Semaphore的使用即可

async def fetch(client,url):
    async with client.get(url) as resp:
        assert resp.status == 200
        text = await resp.text()
        return len(text)

async def fetch_all_manual(urls,client):
    async with asyncio.Semaphore(5):
        return await asyncio.gather(*[fetch(client, url) for url in urls])

sem
urls = ['http://python.org/' for i in range(3)]
loop=asyncio.get_event_loop()
client = aiohttp.ClientSession()
results = loop.run_until_complete(fetch_all_manual(urls,client))
#要手動關閉自己創建的ClientSession，並且client.close()是個協程，得用事件循環關閉
loop.run_until_complete(client.close())
#在關閉loop之前要給aiohttp一點時間關閉ClientSession
loop.run_until_complete(asyncio.sleep(3))
loop.close()
print(results)
print(type(results))

參考文獻

https://www.cnblogs.com/wukai66/p/12632680.html

https://stackoverflow.com/questions/46991562/how-to-reuse-aiohttp-clientsession-pool

https://stackoverflow.com/questions/35196974/aiohttp-set-maximum-number-of-requests-per-second/43857526#43857526

https://github.com/aio-libs/aiohttp/issues/4932

https://www.cnblogs.com/c-x-a/p/9248906.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 JavaScript中removeEventListener()使用注意事項 vue 中ref 的使用注意事項 js中foreach使用注意事項 java中char的使用注意事項 ReportViewer使用及注意事項 FSCalendar使用和注意事項使用redis的五個注意事項 Beetl使用注意事項 Webview的使用和注意事項 Masonry使用注意事項