【python小練】圖片爬蟲之BeautifulSoup4

本文轉載自查看原文 2016-04-17 01:07 7354 python小練

Python3用不了Scrapy!

[重要的事情說三遍，據說大神們還在嘗試把scrapy移植到python3，特么浪費我半個小時pip scrapy = - =]

【更新：py3現在可以用scrapy了，感謝大神們=w=】

先前用正則表達式匹配出符合要求的<img>標簽真的超麻煩的，正則式錯一點點都要完蛋，用bs4感覺方便很多。

bs4是將整個html拆解成字典和數組，所以處理起來比較簡單。

以這個頁面為例（畢竟堆糖本命）：http://www.duitang.com/search/?kw=%E6%96%87%E8%B1%AA%E9%87%8E%E7%8A%AC&type=feed#!s-p1

要下載我想要的圖片，最終目標是圖片的url數據。

先看頁面源碼：

1. 讀取頁面代碼：

html_doc = urllib.request.urlopen(url + "#!s-p" + str(n+x-1)).read().decode('utf-8')
soup = BeautifulSoup(html_doc, "lxml")

2. 見上圖，我想下載的圖片都包含在符合【屬於class="a"的<a>標簽】這個特點的<a>標簽下，用bs4找出這些<a>標簽，用下面這句代碼：

soup.find_all('a', class_='a')
#soup.find_all('(標簽名)',(符合屬性))

3. 從中找出圖片<img>標簽，並獲取鏈接地址url到img_src：

for myimg in soup.find_all('a', class_='a'):
     img_src = myimg.find('img').get('src')

從第二步來看確實是比純粹用正則表達式省時省力。

完整代碼如下，其實也只改了正則那一小部分：

from bs4 import BeautifulSoup
import urllib.request
import os

def downlaodimg(url,m,n):

    os.chdir(os.path.join(os.getcwd(), 'photos'))
    t = 1  # 記錄圖片張數

    for x in range(n-m+1):
        html_doc = urllib.request.urlopen(url + "#!s-p" + str(n+x-1)).read().decode('utf-8')
        soup = BeautifulSoup(html_doc, "lxml")

        for myimg in soup.find_all('a', class_='a'):
            pic_name = str(t) + '.jpg'
            img_src = myimg.find('img').get('src')
            urllib.request.urlretrieve(img_src, pic_name)
            print("Success!" + img_src)
            t += 1
        print("Next page!")

downlaodimg("http://www.duitang.com/search/?kw=%E6%96%87%E8%B1%AA%E9%87%8E%E7%8A%AC&type=feed",1,3)

和前一篇一樣添加了起始頁和終止頁兩個參數。

下載后文件夾：

ps:太宰桑真是太萌辣(●'◡'●)ﾉ♥不說了再去看一遍~

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲beautifulsoup4系列3 python 3.x 爬蟲基礎---Requersts,BeautifulSoup4（bs4） python網絡爬蟲（四）python第三方庫BeautifulSoup4的安裝及測試 Python3利用BeautifulSoup4批量抓取站點圖片的代碼 python3解析庫BeautifulSoup4 Python學習之beautifulsoup4庫的使用 BeautifulSoup4的find_all()和select()，簡單爬蟲學習 Python 爬蟲—— requests BeautifulSoup Python爬蟲之BeautifulSoup和requests python爬蟲（beautifulsoup）