使用python來批量抓取網站圖片

本文轉載自查看原文 2016-02-28 23:11 10883 python

今天"無意"看美女無意溜達到一個網站，發現妹子多多，但是可恨一個page只顯示一張或兩張圖片，家里WiFi也難用，於是發揮"程序猿"的本色，寫個小腳本，把圖片扒下來再看，類似功能已有不少大師實現了，但本着學習鍛煉的精神，自己折騰一遍，漲漲姿勢！

先來效果展示下：

python代碼：

# -*- coding:utf8 -*-
import urllib2
import re
import requests
from lxml import etree
import os


def check_save_path(save_path):
    try:
        os.mkdir(save_path)
    except:
        pass


def get_image_name(image_link):
    file_name = os.path.basename(image_link)
    return file_name


def save_image(image_link, save_path):
    file_name = get_image_name(image_link)
    file_path = save_path + "\\" + file_name
    print("准備下載%s" % image_link)
    try:
        file_handler = open(file_path, "wb")
        image_handler = urllib2.urlopen(url=image_link, timeout=5).read()
        file_handler.write(image_handler)
        file_handler.closed()
    except Exception, ex:
        print(ex.message)


def get_image_link_from_web_page(web_page_link):
    image_link_list = []
    print(web_page_link)
    try:
        html_content = urllib2.urlopen(url=web_page_link, timeout=5).read()
        html_tree = etree.HTML(html_content)
        print(str(html_tree))
        link_list = html_tree.xpath('//p/img/@src')
        for link in link_list:
            # print(link)
            if str(link).find("uploadfile"):
                image_link_list.append("http://www.xgyw.cc/" + link)
    except Exception, ex:
        pass
    return image_link_list


def get_page_link_list_from_index_page(base_page_link):
    try:
        html_content = urllib2.urlopen(url=base_page_link, timeout=5).read()
        html_tree = etree.HTML(html_content)
        print(str(html_tree))
        link_tmp_list = html_tree.xpath('//div[@class="page"]/a/@href')
        page_link_list = []
        for link_tmp in link_tmp_list:
            page_link_list.append("http://www.xgyw.cc/" + link_tmp)
        return page_link_list
    except Exception, ex:
        print(ex.message)
        return []


def get_page_title_from_index_page(base_page_link):
    try:
        html_content = urllib2.urlopen(url=base_page_link, timeout=5).read()
        html_tree = etree.HTML(html_content)
        print(str(html_tree))
        page_title_list = html_tree.xpath('//td/div[@class="title"]')
        page_title_tmp = page_title_list[0].text
        print(page_title_tmp)
        return page_title_tmp
    except Exception, ex:
        print(ex.message)
        return ""


def get_image_from_web(base_page_link, save_path):
    check_save_path(save_path)
    page_link_list = get_page_link_list_from_index_page(base_page_link)
    for page_link in page_link_list:
        image_link_list = get_image_link_from_web_page(page_link)
        for image_link in image_link_list:
            save_image(image_link, save_path)


base_page_link = "http://www.xgyw.cc/tuigirl/tuigirl1346.html"
page_title = get_page_title_from_index_page(base_page_link)
if page_title <> "":
    save_path = "N:\\PIC\\" + page_title
else:
    save_path = "N:\\PIC\\other\\"

get_image_from_web(base_page_link, save_path)

View Code

代碼思路：

使用urllib2.urlopen(url).open來獲取頁面數據，再使用etree.HTML()將頁面解析成xml格式，方便使用xmlpath方式來獲取特定node的值，最終遍歷所有頁面得到要下載的圖片，將圖片保存到本地。

--=========================================================

python包安裝：

很多python包沒有windows安裝包，或者沒有X64版本的安裝包，對於新手來說，很難快速上手，可以使用pip或easy_install來安裝要使用的安裝包，相關安裝方式：https://pypi.python.org/pypi/setuptools

本人采用easy_install方式，我電腦安裝python2.7，安裝路徑為：C:\Python27\python.exe，下載ez_setup.py文件后到c盤保存，然后運行cmd執行以下命令：

C:\Python27\python.exe "c:\ez_setup.py"

即可安裝easy_install，安裝結束后可以C:\Python27\Scripts下看到easy_install-2.7.exe，如果我們想在本地安裝requests包，那么可以運行以下命令來試下：

"C:\Python27\Scripts\easy_install-2.7.exe" requests

--==========================================================

依舊是妹子壓貼，推女郎第68期，想要圖的自己百度

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python網絡爬蟲抓取網站圖片使用Python批量處理圖片使用Python輕松批量壓縮圖片使用Python調用Flickr API抓取圖片數據 python爬蟲抓取圖片 python爬蟲，一段完整的python爬蟲批量下載網站圖片資源的代碼 python爬蟲批量抓取ip代理 python對圖片批量命名使用python批量提取docx文檔中的圖片利用HtmlAgilityPack抓取網站圖片並下載~~~~~~邪惡完善版