在python的HTTP庫中,有requests、aiohttp和httpx。
requests只能發送同步請求,aiohttp只能發送異步請求,httpx既能發送同步請求,也能發送異步請求。
aiohttp在異步請求上效率最快,我們來一起學習下:
介紹
aiohttp核心是異步並發,基於asyncio/await,可實現單線程並發IO操作。
安裝
pip install aiohttp
使用
客戶端使用
import aiohttp,asyncio
async def my_request():
async with aiohttp.ClientSession() as session:
# verify_ssl = False # 防止ssl報錯
async with session.get('http://www.csdn.net/',verify_ssl=False) as response:
print('status:',response.status)
print('content-type',response.headers['content-type'])
html=await response.text()
print(f'body:{html[:15]}')
# 創建事件循環
loop=asyncio.get_event_loop()
tasks=[my_request(),]
loop.run_until_complete(asyncio.wait(tasks))
運行結果:

*python3.7以上版本運行使用asyncio.run(my_request())
服務端使用
import aiohttp,asyncio
async def hello(request):
name=request.match_info.get('name','jack')
text='hello '+name
return web.Response(text=text)
app=web.Application()
app.add_routes([
web.get('/',hello),
web.get('/{name}',hello)
])
web.run_app(app,host='127.0.0.1')
aiohttp客戶端的簡單應用
async def get_html(session,url):
#發送一個get請求信息
async with session.get(url,verify_ssl=False) as response:
print('status:',response.status)
return await response.text()
async def main():
#建立客戶端會話
async with aiohttp.ClientSession() as session:
html1=await get_html(session,'http://www.csdn.net/')
html2=await get_html(session,'http://python.org')
print(html1)
print(html2)
loop= asyncio.get_event_loop()
tasks=[main(),]
loop.run_until_complete(asyncio.wait(tasks))
以上例子也可以發送POST、DELETE、PUT方法,請求參數還有headers,params,data等。
aio 異步爬蟲
import aiohttp,asyncio import time async def get_html(session,url): print('發送請求:',url) async with session.get(url,verify_ssl=False)as response: content=await response.content.read() print('得到結果',url,len(content)) filename=url.rsplit('/')[-1] print('正在下載',filename) with open(filename,'wb') as file_object: file_object.write(content) print(filename,'下載成功') async def main(): async with aiohttp.ClientSession() as session: start_time=time.time() url_list=[ 'https://images.cnblogs.com/cnblogs_com/blueberry-mint/1877253/o_201106093544wallpaper1.jpg', 'https://images.cnblogs.com/cnblogs_com/blueberry-mint/1877253/o_201106093557wallpaper2.jpg', 'https://images.cnblogs.com/cnblogs_com/blueberry-mint/1877253/o_201106093613wallpaper3.jpg', ] tasks=[loop.create_task(get_html(session,url))for url in url_list] await asyncio.wait(tasks) end_time=time.time() print('is cost',round(end_time-start_time),'s') loop=asyncio.get_event_loop() loop.run_until_complete(main())

ClientSession部分重要參數:
1.TCPConnector 用於常規TCP套接字(同時支持HTTP和HTTPS方案)(絕大部分使用)。
2.UnixConnector 用於通過UNIX套接字進行連接(主要用於測試)。
所有的連接器都應繼承自BaseConnector。
#創建一個TCPConnector
conn=aiohttp.TCPConnector(verify_ssl=False)
#作為參數傳入ClientSession
async with aiohppt.ClientSession(connector=conn) as session:
TCPConnector比較重要的參數有:
verify_ssl(bool):布爾值,對HTTPS請求執行SSL證書驗證(默認情況下啟動),當要跳過具有無效證書的站點的驗證時可設置為False.
limit(int):整型,同時連接的總數。如果limit為None,則connector沒有限制。(默認值:100)。
limit_per_host(int):限制同時連接到同一個端點的總數。如果(host,port,is_ssl)三者相同,則端點相同。如果linit=0,則沒有限制。
限制並發量的另一個做法(使用Semaphore)
使用Semaphore直接限制發送請求。
import backoff as backoff import requests,time,logging,aiohttp,asyncio from requests.adapters import HTTPAdapter # logging.basicConfig(level=logging.DEBUG) my_logger=logging.getLogger(__name__) my_handler=logging.FileHandler('log.txt') my_handler.setLevel(logging.DEBUG) formatter=logging.Formatter("%(asctime)s %(levelname)s %(pathname)s %(filename)s %(funcName)s %(lineno)s" " -%(message)s","%Y-%m-%d %H:%M:%S") my_handler.setFormatter(formatter) my_logger.addHandler(my_handler) my_logger.setLevel(logging.DEBUG) now = lambda: time.time() @backoff.on_exception(backoff.expo,aiohttp.ClientError,max_tries=3,logger=my_logger) async def get_html(session,i,url): start=now() async with session.get(url,verify_ssl=False) as response: # return await response.text() r=await response.read() end_time=now() cost=end_time-start msg='第{}個請求,開始時間:{},花費時間:{},返回信息:{}\n'.format(i,start,cost,r.decode('utf-8')) print('running %d'% i,now(),msg) # 使用semaphore 限制最大並發數 async def bound_register(sem,session,i,url): async with sem: await get_html(session,i,url) async def run(num,url): tasks=[] sem=asyncio.Semaphore(100) connector=aiohttp.TCPConnector(limit=0,verify_ssl=False) async with aiohttp.ClientSession(connector=connector) as session: for i in range(num): task=asyncio.ensure_future( bound_register(sem=sem,session=session,i=i,url=url) ) tasks.append(task) responses=asyncio.gather(*tasks) await responses start=now() number=200 # url2='http://www.baidu.com' url='http://127.0.0.1:8000/rest/href/single_href/?title=6' loop=asyncio.get_event_loop() future=asyncio.ensure_future(run(number,url)) loop.run_until_complete(future) print('總耗時: %0.3f' % (now() - start))
參考文章:https://www.cnblogs.com/blueberry-mint/p/13937205.html
