urlretrieve下載圖片--爬蟲


 1 from lxml import etree
 2 import requests
 3 from urllib import request
 4 
 5 url = 'http://www.haoduanzi.com/'
 6 headers = {
 7     'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
 8 }
 9 url_content = requests.get(url, headers=headers).text
10 
11 tree = etree.HTML(url_content)
12 
13 div_list = tree.xpath('//div[@id="main"]/div')[2:-1]
14 
15 i = 0
16 for div in div_list:
17     img_url = div.xpath('./div/img/@src')[0]
18     img_content = requests.get(url=img_url, headers=headers).content
19     request.urlretrieve(url=img_url, filename='img' + str(i) + '.jpg')
20     i += 1

不要采用IO操作,容易出現問題,for循環執行效率要快於with open的效率,錯誤代碼如下:

 1 from lxml import etree
 2 import requests
 3 from uuid import uuid4
 4 import time
 5 from urllib import request
 6 
 7 url = 'http://www.haoduanzi.com/'
 8 headers = {
 9     'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
10 }
11 url_content = requests.get(url, headers=headers).text
12 
13 tree = etree.HTML(url_content)
14 
15 div_list = tree.xpath('//div[@id="main"]/div')[2:-1]
16 filename = uuid4()
17 # i = 0
18 for div in div_list:
19     img_url = div.xpath('./div/img/@src')[0]
20     img_content = requests.get(url=img_url, headers=headers).content
21     # request.urlretrieve(url=img_url, filename='img' + str(i) + '.jpg')
22     # i += 1
23     time.sleep(2)
24     with open(r'C:\jupyter\day02\%s.jpg' % filename, 'wb') as f:
25         f.write(img_content)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM