當使用臨時的IP請求數據時,由於這些IP的過期時間極短,通常在1分鍾~5分鍾左右,這時scrapy就會報發以下錯誤
2020-01-17 17:00:48 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://xxxx/co
s): Connection was refused by other side: 10061: 由於目標計算機積極拒絕,無法連接。.
這時如何自動切換IP,然后重新請求呢?
先看看scrapy的整體框架圖,此錯誤是RetryMiddleware這個中間件報出的錯誤,也就是下圖的的步驟5
所以一個方法是新建個Middleware,繼承RetryMiddleware,重寫process_exception函數,添加重置request proxy即可:
def process_exception(self, request, exception, spider):
## 針對超時和無響應的reponse,獲取新的IP,設置到request中,然后重新發起請求
if '10061' in str(exception) or '10060' in str(exception):
self.proxy_ip = fetch_proxy_ip()
if self.proxy_ip:
current_proxy = f'http://{self.proxy_ip}'
request.meta['proxy'] = current_proxy
if isinstance(exception, self.EXCEPTIONS_TO_RETRY) and not request.meta.get('dont_retry', False):
return self._retry(request, exception, spider)