寫在前面
pixiv
是著名的插畫網站。如果我們通過爬蟲技術得到了pixiv
網站圖片的url
,那么如何根據url
下載圖片到本地。
安裝模塊
pip install requests
測試樣例
打開以下頁面
https://www.pixiv.net/artworks/77926406
復制圖片地址
https://i.pximg.net/img-original/img/2019/11/22/00/00/13/77926406_p0.jpg
下載圖片
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
headers = {'Referer': 'https://www.pixiv.net/'}
url = 'https://i.pximg.net/img-original/img/2019/11/22/00/00/13/77926406_p0.jpg'
res = requests.get(url, headers=headers, verify=False)
with open('test.jpg', 'wb') as f:
f.write(res.content)
注意事項
請求頭添加Referer
headers = {'Referer': 'https://www.pixiv.net/'}
關閉SSL
證書驗證
verify = False
溫馨提示
Referer
pixiv
設置了圖片防盜鏈,所以需要添加Referer
。
Referer
的作用就是告訴你要下載的那個圖片頁面,我是從主頁面來的,你可以放心的把數據給我。
舉個栗子:
- 直接訪問測試樣例的圖片地址,會報403錯誤
- 使用ModHeader這款插件,修改請求頭:設置
Referer
為https://www.pixiv.net/
- 添加
Referer
后就能正常顯示圖片了
verify=False
pixiv
用的是私有證書,如果設置verify=True
,下載會報錯:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='i.pximg.net', port=443): Max retries exceeded with url: /img-original/img/2019/11/22/00/00/13/77926406_p0.jpg
(Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000027A06DCC3D0>: Failed to establish a new connection: [WinError 10061] 由於目標計算機積極拒絕,無法連接。'))
請求圖片地址的時候設置了verify=False
,所以會彈出警告:
InsecureRequestWarning:
Unverified HTTPS request is being made to host 'i.pximg.net'.
Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
為了不讓程序運行時彈出警告,我們需要添加以下兩行代碼:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
引用參考
https://blog.csdn.net/python_neophyte/article/details/82562330
https://requests.readthedocs.io/zh_CN/latest/user/advanced.html#ssl
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings