python爬取網頁時返回http狀態碼HTTP Error 418

本文轉載自查看原文 2020-01-06 18:37 20321 python

問題：urllib.error.HTTPError: HTTP Error 418:

問題描述：當我使用Python的request爬取網頁時返回了http狀態碼為418,

錯誤描述：經過網上查詢得知，418的意思是被網站的反爬程序返回的，網上解釋為，418 I'm a teapot
The HTTP 418 I'm a teapot client error response code indicates that the server refuses to brew coffee because it is a teapot. This error is a reference to Hyper Text Coffee Pot Control Protocol which was an April Fools' joke in 1998.

翻譯為：HTTP 418 I'm a teapot客戶端錯誤響應代碼表示服務器拒絕煮咖啡，因為它是一個茶壺。這個錯誤是對1998年愚人節玩笑的超文本咖啡壺控制協議的引用。

解決辦法：當時我用的是urllib的request,我感覺這個庫應該有點久了，所以換了requests這個庫，然后再次請求，並添加了header的信息就可以了，如果不加程序放回的是空，沒有結果，運行不會錯

使用request:

from urllib import request

r = request.urlopen(url)

html = r.read().decode("utf-8")

print(html)

使用requests並添加headers信息后：

import requests

headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}

r = requests.get(url,headers=headers)

html = r.text

print(html)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬取網頁時返回http狀態碼HTTP Error 418 Python之爬取網頁時遇到的問題——BeautifulSoup python用beautifulsoup爬取網頁時出現亂碼的解決方法爬取動態網頁時遇到的問題關於js渲染網頁時爬取數據的思路和全過程（附源碼）記錄幾個爬取動態網頁時的問題(下拉框，舊的元素無法獲取，獲取的源代碼和f12看到的不一致，爬取延遲) 常見http返回的狀態碼 http web返回狀態碼 python爬取簡單網頁 python HTTP 狀態碼