爬蟲 - 169tp圖片 - 碼上歡樂

爬蟲 - 169tp圖片

本文轉載自查看原文 2020-03-29 16:01 868

一、目標

爬取網址 https://www.169tp.com/gaogensiwa/ 前20頁的美女圖片

二、准備

通過pip安裝第三方庫 request、PyQuery、fake_useragent

pip install request

pip install PyQuery

pip install fake_useragent

項目下新建image目錄

三、代碼
import requests
from pyquery import PyQuery as pq
# 可自動生成瀏覽器UserAgent請求頭
from fake_useragent import UserAgent
# 模擬瀏覽器請求頭
headers = {
　　# 請求類型
　　'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
　　# 瀏覽器類型 (有的網址服務器檢測瀏覽器反扒其中的一種) 可隨機生成瀏覽器類型
　　'User-Agent': UserAgent().random
}

# 抓取每個表格圖片url

def index_data(page):
　　url = 'https://www.169tp.com/gaogensiwa/list_3_{}.html'.format(page)
　　# 獲取首頁數據
　　response = requests.get(url,headers=headers).content.decode('gbk')
　　# 初始化網頁數據
　　doc = pq(response)
　　# 取需要層級的塊 list <a>

　　data = doc('.product01 li a').items()
　　# 遍歷 a 獲取href 鏈接
　　for i in data:
　　　　detail_url = i.attr('href')
　　　　detail_data(detail_url)

# 獲取詳情頁url

def detail_data(urls):
　　response = requests.get(urls,headers=headers).content.decode('gbk')
　　doc = pq(response)
　　img_url = doc('.big_img p img').items()
　　for i in img_url:
　　　　image_url = i.attr('src')
　　download_img(image_url)

count = 0

# 保存圖片
def download_img(image_url):
　　global count
　　response = requests.get(image_url, headers=headers).content
　　# 保存文件
　　with open('image/{}.jpg'.format(count), 'ab') as f: # a追加文件 b進制寫入
　　　　f.write(response)
　　count += 1

# 提取前20頁 /觀察分頁域名變化

for i in range(1, 20):
　　index_data(i)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 169美女圖片學習筆記169—PS 修改圖片某個特定部位的顏色 TP5上傳圖片 TP圖片上傳 TP5的圖片上傳 TP5圖片上傳 tp5 刪除圖片以及文件 TP框架圖片壓縮/上傳 tp6 單圖片上傳和多圖片上傳 Python 爬蟲保存圖片