Python學習之實現簡單的高並發爬蟲爬取網頁

本文轉載自查看原文 2020-03-04 21:02 680

import gevent,time
from  urllib import request
#urllib的io操作gevent不會識別，不會自動切換，以下方法解決
from gevent import monkey
monkey.patch_all() #把當前程序的所有Io操作給我單獨的做上標記

def f(url):
    print('GET: %s' % url)
    resp = request.urlopen(url)
    data = resp.read()
    # f = open("url.html","wb")
    # f.write(data)
    # f.close()
    print('%d bytes received from %s.' % (len(data), url))

#串行
urls = [
    'https://www.python.org/',
    'https://www.nginx.org/',
    'https://github.com/',
]
time_start = time.time()
for url in urls:
    f(url)
print("同步cost",time.time()-time_start)

#協程並行
async_time_start = time.time()
gevent.joinall([
    gevent.spawn(f, 'https://www.python.org/'),
    gevent.spawn(f, 'https://www.yahoo.com/'),
    gevent.spawn(f, 'https://github.com/'),
])
print("異步cost",time.time()-async_time_start)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python學習之實現高並發socket server python爬蟲學習（四）：爬取網頁圖片-正則解析數據爬蟲基本流程及簡單爬取網頁 typescript 學習筆記 - 簡單網頁爬蟲1：爬取整個網頁的內容 Python爬蟲爬取網頁圖片 Python爬蟲學習——使用selenium和phantomjs爬取js動態加載的網頁【網絡爬蟲學習】實戰，爬取網頁以及貼吧數據 python爬蟲——爬取網頁數據和解析數據 Python 爬蟲-selenium動態網頁爬取 Python多線程爬蟲爬取網頁圖片