6.23 自我總結

爬蟲多線程高效高速爬取圖片

基於之前的爬取代碼我們進行函數的封裝並且加入多線程

之前的代碼https://www.cnblogs.com/pythonywy/p/11066842.html

from concurrent import futures導入的模塊

ex = futures.ThreadPoolExecutor(max_workers =22) #設置線程個數

ex.submit(方法,方法需要傳入的參數)

import os
import requests
from lxml.html import etree
from concurrent import futures  #多線程

url = 'http://www.doutula.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',}
def img_url_lis(url):
    response = requests.get(url,headers = headers)
    response.encoding = 'utf8'
    response_html = etree.HTML(response.text)
    img_url_lis = response_html.xpath('.//img/@data-original')
    return img_url_lis


#創建圖片文件夾
img_file_path = os.path.join(os.path.dirname(__file__),'img')
if not os.path.exists(img_file_path):  # 沒有文件夾名創建文件夾
    os.mkdir(img_file_path)
print(img_file_path)

def dump_one_img(url):
    name = str(url).split('/')[-1]
    response = requests.get(url, headers=headers)
    img_path = os.path.join(img_file_path, name)
    with open(img_path, 'wb') as fw:
        fw.write(response.content)


def dump_imgs(urls:list):
    for url in urls:
        ex = futures.ThreadPoolExecutor(max_workers =22)  #多線程
        ex.submit(dump_one_img,url)   #方法,對象
        # dump_one_img(url)


def run():
    count = 1
    while True:
        if count == 10:
            count += 1
            continue
        lis = img_url_lis(f'http://www.doutula.com/article/list/?page={count}')
        if len(lis) == 0:
            print(count)
            break
        dump_imgs(lis)
        print(f'第{count}頁也就完成')
        count +=1

if __name__ == '__main__':
    run()

可以更加快速的爬取多個內容

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python多線程爬取圖片二 python多線程爬取圖片實例利用Python多線程爬蟲——爬圖片 python爬蟲（爬取圖片） scrapy爬蟲，爬取圖片 Python多線程爬蟲爬取網頁圖片 Python多線程Threading爬取圖片，保存本地，openpyxl批量插入圖片到Excel表中爬蟲---Beautiful Soup 爬取圖片 python網絡爬蟲之爬取圖片【爬蟲】網頁圖片爬蟲工具——從谷歌必應上爬取圖片