工作問題--------爬蟲遇到requests.exceptions.ConnectionError: HTTPSConnectionPool Max retries exceeded

本文轉載自查看原文 2020-02-18 15:33 710 爬蟲/ 日常問題/ 工作

問題描述：爬取京東的網站，爬取一段時間后報錯。

經過一番查詢，發現該錯誤是因為如下：

http的連接數超過最大限制，默認的情況下連接是Keep-alive的，所以這就導致了服務器保持了太多連接而不能再新建連接。
ip被封
程序請求速度過快。

解決辦法如下：

第一種方法

try:
    page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
    r.status_code = "Connection refused"

第二種方法：

request的連接數過多而導致Max retries exceeded

在header中不使用持久連接

'Connection': 'close'或requests.adapters.DEFAULT_RETRIES = 5

第三種方法：

針對請求請求速度過快導致程序報錯。

解決方法可以參考以下例子：

import time
 
while 1:
    try:
        page = requests.get(url)
    except:
        print("Connection refused by the server..")
        print("Let me sleep for 5 seconds")
        print("ZZzzzz...")
        time.sleep(5)
        print("Was a nice sleep, now let me continue...")
        continue
http://www.chenxm.cc/post/536.html

原文地址：http://www.chenxm.cc/post/536.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。