1 前言

　　Scrapy使用ImagesPipeline類中函數get_media_requests下載到圖片后，默認的圖片命名為圖片下載鏈接的哈希值，例如：它的下載鏈接是http://img.ivsky.com/img/bizhi/pre/201101/10/harry_potter5-017.jpg，哈希值為7710759a8e3444c8d28ba81a4421ed,那么最終的圖片下載到指定路徑后名稱為7710759a8e3444c8d28ba81a4421ed.JPG。想要自定義圖片名稱則需要借助ImagesPipeline類中item_completed（）函數來重命名。

2 爬蟲過程

　　爬蟲過程就不贅述了，鏈接請參看：https://www.cnblogs.com/mrtop/p/10180072.html，本文章重點介紹如何自定義圖片名稱。爬蟲運行后獲得的圖片如下圖：

3 自定義圖片名稱具體方法

3.1 自定義圖片名稱代碼

import os
from  harry.settings import IMAGES_STORE as IMGS
from scrapy.pipelines.images import ImagesPipeline
from scrapy import Request
class HarryPipeline(object):
    def process_item(self, item, spider):
        return item
class HarryDownLoadPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        for imgurl in item['img_url']:
            yield Request(imgurl)
    def item_completed(self, results, item, info):
        print ('******the results is********:',results)
        os.rename(IMGS + '/' + results[0][1]['path'], IMGS + '/' + item['img_name'])
    def __del__(self):
            #完成后刪除full目錄
            os.removedirs(IMGS + '/' + 'full')

　　注：對於def __del__(self)函數可要可不要，因為重命名過程是攜帶路徑重命名，所以默認生成的full文件夾就為空，只是順手刪除空文件夾（如果里面有文件存在是刪除不了的）

3.2 自定義圖片名稱代碼詳細解析

3.2.1 get_media_requests函數

get_media_requests方法的原型為：

def item_completed(self, results, item, info):
        if isinstance(item, dict) or self.images_result_field in item.fields:
            item[self.images_result_field] = [x for ok, x in results if ok]
        return item

可以看到get_media_requests有三個參數，

第一個是self，這個不必多說；

第二個是 item，這個就是 spiders傳遞過來的 item

第三個是 info，看名字就知道這是用來保存信息的，至於是什么信息，info其實是一個用來保存保存圖片的名字和下載鏈接的列表

3.2.2 Item_completed函數

item_completed方法的原型如下：

def item_completed(self, results, item, info):
        if isinstance(item, dict) or self.images_result_field in item.fields:
            item[self.images_result_field] = [x for ok, x in results if ok]
        return item

注意到 item_completed里有個 results參數，results參數保存了圖片下載的相關信息，將他print看看具體信息：

[(True, {'url': 'http://img.ivsky.com/img/bizhi/pre/201101/10/harry_potter5-015.jpg', 'path': 'full/539c5914730497b094e5c98bfdfe19b65f5.jpg', 'checksum': '37d23ffb0ab983ac2da9a9d'})]

真實結構為一個list [(DownLoad_success_or_failure)，dict]，字典中含有三個鍵：1、'url'：圖片路徑 2、'path'：圖片下載后的保存路徑 3、'checksum'：校驗碼

從中我們可以看到只要我們修改字典中圖片保存路徑（路徑詳細到圖片名稱）的值，那么我們就能自定義圖片名稱。

關鍵代碼為：

os.rename(IMGS + '/' + results[0][1]['path'], IMGS + '/' + item['img_name'])

解釋：rename函數，results[0][1]['path']意思就是：在result這個list中找到圖片的名稱，其中我們也可以看到這個圖片的位置是絕對路徑，所以需要攜帶路徑IMGS修改。

4 更新pipelines.py后運行結果

如有疑問，歡迎留言討論交流，轉載請注明出處。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Scrapy圖片下載，自定義圖片名字 scrapy爬蟲，爬取圖片 scrapy 爬取圖片最基本操作使用Scrapy爬取圖片入庫,並保存在本地 python網絡爬蟲之使用scrapy爬取圖片 scrapy爬蟲系列之三--爬取圖片保存到本地 js 上傳圖片，用戶自定義截取圖片大小 python爬取圖片用python爬取圖片使用scrapy爬取圖片，自己處理下載和使用scrapy處理下載