在用 PySpider 爬取 https 開頭的網站的時候遇到了 HTTP 599: SSL certificate problem: self signed certificate in certificate chain 的錯誤。
經過一番排查,解決方案總結如下
錯誤原因
這個錯誤會發生在請求 https 開頭的網址,SSL 驗證錯誤,證書有誤。
報錯如下:
[E 180823 09:18:21 base_handler:203] HTTP 599: SSL certificate problem: self signed certificate in certificate chain Traceback (most recent call last): File "f:\python\python36\lib\site-packages\pyspider\libs\base_handler.py", line 196, in run_task result = self._run_task(task, response) File "f:\python\python36\lib\site-packages\pyspider\libs\base_handler.py", line 175, in _run_task response.raise_for_status() File "f:\python\python36\lib\site-packages\pyspider\libs\response.py", line 172, in raise_for_status six.reraise(Exception, Exception(self.error), Traceback.from_string(self.traceback).as_traceback()) File "f:\python\python36\lib\site-packages\six.py", line 692, in reraise raise value.with_traceback(tb) File "f:\python\python36\lib\site-packages\pyspider\fetcher\tornado_fetcher.py", line 378, in http_fetch response = yield gen.maybe_future(self.http_client.fetch(request)) File "f:\python\python36\lib\site-packages\tornado\httpclient.py", line 102, in fetch self._async_client.fetch, request, **kwargs)) File "f:\python\python36\lib\site-packages\tornado\ioloop.py", line 458, in run_sync return future_cell[0].result() File "f:\python\python36\lib\site-packages\tornado\concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info Exception: HTTP 599: SSL certificate problem: self signed certificate in certificate chain
最簡單的解決方法是:
在 crawl 方法中加入忽略證書驗證的參數,validate_cert=False,即
1
|
self.crawl(url,callback=method_name,validate_cert=False)
|