Python 網絡爬蟲（圖片采集腳本）

本文轉載自查看原文 2016-09-29 05:17 6599 Demo/ Python

===============爬蟲原理==================

通過Python訪問網站，獲取網站的HTML代碼，通過正則表達式獲取特定的img標簽中src的圖片地址。

之后再訪問圖片地址，並通過IO操作將圖片保存到本地。

===============腳本代碼==================

import urllib.request  # 網絡訪問模塊
import random  # 隨機數生成模塊
import re  # 正則表達式模塊
import os  # 目錄結構處理模塊

# 初始化配置參數
number = 10  # 圖片收集數量
path = 'img/'  # 圖片存放目錄

# 文件操作
if not os.path.exists(path):
    os.makedirs(path)


# 圖片保存
def save_img(url, path):
    message = None
    try:
        file = open(path + os.path.basename(url), 'wb')
        request = urllib.request.urlopen(url)
        file.write(request.read())
    except Exception as e:
        message = str(e)
    else:
        message = os.path.basename(url)
    finally:
        if not file.closed:
            file.close()
        return message


# 網絡連接
http = 'http://zerospace.asika.tw/photo/'  # 目標網址
position = 290 + int((1000 - number) * random.random())
ids = range(position, position + number)
for id in ids:
    try:
        url = "%s%d.html" % (http, id)  # 后綴生成
        request = urllib.request.urlopen(url)
    except Exception as e:
        print(e)
        continue
    else:
        buffer = request.read()
        buffer = buffer.decode('utf8')
        pattern = 'class="content-img".+\s+.+src="(.+\.jpg)"'
        imgurl = re.findall(pattern, buffer)  # 過濾規則
        if len(imgurl) != 0:
            print(save_img(imgurl[0], path))
        else:
            continue
    pass

===============運行結果==================

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python 網絡爬蟲（新聞采集腳本） Python通用網絡爬蟲腳本 Python大作網圖片采集下載，多線程圖片爬蟲 python爬蟲采集老蝸牛寫采集：網絡爬蟲（二）老蝸牛寫采集：網絡爬蟲（一） python網絡爬蟲抓取網站圖片【python】網絡爬蟲抓取圖片 python網絡爬蟲之爬取圖片 python 之爬蟲數據采集