解決python爬蟲問題：urllib.error.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.

本文轉載自查看原文 2021-11-19 07:54 867 爬蟲--Python

報錯的原始方法：

1）使用request.Request，出現上述錯誤。html無法爬取

from urllib import request

def get_html(self, url):
    print(url)
    req = request.Request(url=url, headers={'User-Agent': random.choice(ua_list)})
    res = request.urlopen(req)
    # html = res.read().decode()
    html = req.read().decode("gbk", 'ignore')
    with open(filename, 'w') as f:
        f.write(html)
    self.parse_html(html)

解決方法：

1）將urllib.request 換成requests庫，需要重新安裝。

2）具體原因，我也不清楚。

　　import requests
    def get_html(self, url):
        print(url)
        req = requests.get(url=url, headers={'User-Agent': random.choice(ua_list)})
        req.encoding = 'utf-8'
        # print(req.text)
        # res = request.urlopen(req)
        # html = res.read().decode()
        # print(req)
        # html = req.read().decode("gbk", 'ignore')
        # print(html)
        # 直接調用解析函數
        # filename = '123456.html'
        # with open(filename, 'w') as f:
        #     f.write(html)
        self.parse_html(req.text)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python3 raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbid python之urllib.request.urlopen(url)報錯urllib.error.HTTPError: HTTP Error 403: Forbidden處理及引申瀏覽器User Agent處理 python爬蟲(五)_urllib2：urlerror和httperror “Error: Too many re-renders. React limits the number of renders to prevent an infinite loop.” 爬蟲urllib2 的異常錯誤處理URLError和HTTPError Python爬蟲-urllib模塊爬蟲之urllib.error模塊 Python爬蟲-----基於urllib,urllib2,re Python爬蟲報錯："HTTP Error 403: Forbidden" Python爬蟲(二)_urllib2的使用