python爬蟲beautifulsoup4系列3

本文轉載自查看原文 2017-06-03 11:08 1293 beautifulsoup4

前言

本篇手把手教大家如何爬取網站上的圖片，並保存到本地電腦

一、目標網站

1.隨便打開一個風景圖的網站：http://699pic.com/sousuo-218808-13-1.html

2.用firebug定位，打開firepath里css定位目標圖片

3.從下圖可以看出，所有的圖片都是img標簽，class屬性都是lazy

二、用find_all找出所有的標簽

1.find_all(class_="lazy")獲取所有的圖片對象標簽

2.從標簽里面提出jpg的url地址和title

 1 # coding:utf-8
 2 from bs4 import BeautifulSoup  3 import requests  4 import os  5 r = requests.get("http://699pic.com/sousuo-218808-13-1.html")  6 fengjing = r.content  7 soup = BeautifulSoup(fengjing, "html.parser")  8 # 找出所有的標簽
 9 images = soup.find_all(class_="lazy") 10 # print images # 返回list對象
11 
12 for i in images: 13     jpg_rl = i["data-original"] # 獲取url地址 14     title = i["title"] # 返回title名稱 15     print title 16     print jpg_rl 17     print ""

三、保存圖片

1.在當前腳本文件夾下創建一個jpg的子文件夾

2.導入os模塊，os.getcwd()這個方法可以獲取當前腳本的路徑

3.用open打開寫入本地電腦的文件路徑，命名為：os.getcwd()+"\\jpg\\"+title+'.jpg'（命名重復的話，會被覆蓋掉）

4.requests里get打開圖片的url地址，content方法返回的是二進制流文件，可以直接寫到本地

四、參考代碼

from bs4 import BeautifulSoup
import requests
import os
r = requests.get("http://699pic.com/sousuo-218808-13-1.html")
fengjing = r.content
soup = BeautifulSoup(fengjing, "html.parser")
# 找出所有的標簽
images = soup.find_all(class_="lazy")
# print images # 返回list對象

for i in images:
    try:
        jpg_rl = i["data-original"]
        title = i["title"]
        print(title)
        print(jpg_rl)
        print("")
        with open(os.getcwd()+"\\jpg\\"+title+'.jpg', "wb") as f:
            f.write(requests.get(jpg_rl).content)
    except:
        pass

對python接口自動化有興趣的，可以加python接口自動化QQ群：226296743

也可以關注下我的個人公眾號：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲beautifulsoup4系列1 python爬蟲beautifulsoup4系列2 python爬蟲beautifulsoup4系列4-子節點 python爬蟲beautifulsoup4系列4-子節點【python小練】圖片爬蟲之BeautifulSoup4 python爬蟲入門（三）XPATH和BeautifulSoup4 Python 爬蟲之 Beautifulsoup4，爬網站圖片 python 3.x 爬蟲基礎---Requersts,BeautifulSoup4（bs4） Python: 安裝BeautifulSoup4 Python爬蟲教程-23-數據提取-BeautifulSoup4（一）