解決python爬蟲問題:urllib.error.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.


報錯的原始方法:

1)使用request.Request,出現上述錯誤。html無法爬取

from urllib import request

def get_html(self, url):
    print(url)
    req = request.Request(url=url, headers={'User-Agent': random.choice(ua_list)})
    res = request.urlopen(req)
    # html = res.read().decode()
    html = req.read().decode("gbk", 'ignore')
    with open(filename, 'w') as f:
        f.write(html)
    self.parse_html(html)

 

解決方法:

1)將urllib.request 換成requests庫,需要重新安裝。

2)具體原因,我也不清楚。

  import requests
def get_html(self, url): print(url) req = requests.get(url=url, headers={'User-Agent': random.choice(ua_list)}) req.encoding = 'utf-8' # print(req.text) # res = request.urlopen(req) # html = res.read().decode() # print(req) # html = req.read().decode("gbk", 'ignore') # print(html) # 直接調用解析函數 # filename = '123456.html' # with open(filename, 'w') as f: # f.write(html) self.parse_html(req.text)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM