python3 使用urllib報錯urlopen error EOF occurred in violation of protocol (_ssl.c:841)

本文轉載自查看原文 2019-09-01 16:11 3334 編程錯誤記錄/ urlopen error EOF

python3源碼：

import urllib.request
from bs4 import BeautifulSoup

response = urllib.request.urlopen("http://php.net/")
html = response.read()
soup=BeautifulSoup(html, "html5lib")
text=soup.get_text(strip=True)
print(text)

　　代碼很簡單，就是抓取http://php.net/頁面文本內容，然后使用BeautifulSoup模塊清除過濾掉多余的html標簽。貌似第一次允許成功了，之后一直卡着再報錯：

  File "C:\Python36\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Python36\lib\urllib\request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Python36\lib\urllib\request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:841)>

　　實際上google瀏覽器是能夠訪問的。

　　此問題可能是由於Web服務器上禁用了SSLv2，而比較老的python庫Python 2.x嘗試默認情況下與PROTOCOL_SSLv23建立連接。因此在這種情況下，需要選擇請求使用的SSL版本。

　　要更改HTTPS中使用的SSL版本，需要將該HTTPAdapter類子類化並將其掛載到 Session對象。例如，如果想強制使用TLSv1，則新的傳輸適配器將如下所示：

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager

class MyAdapter(HTTPAdapter):
    def init_poolmanager(self, connections, maxsize, block=False):
        self.poolmanager = PoolManager(num_pools=connections,
                                       maxsize=maxsize,
                                       block=block,
                                       ssl_version=ssl.PROTOCOL_TLSv1)

　　然后，可以將其掛載到Requests Session對象：

s=requests.Session()
s.mount('https://', MyAdapter())
response = urllib.request.urlopen("http://php.net/")

　　編寫一個通用傳輸適配器還是很簡單，它可以從ssl構造函數中的包中獲取任意SSL類型並使用它。

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager

class SSLAdapter(HTTPAdapter):
    '''An HTTPS Transport Adapter that uses an arbitrary SSL version.'''
    def __init__(self, ssl_version=None, **kwargs):
        self.ssl_version = ssl_version

        super(SSLAdapter, self).__init__(**kwargs)

    def init_poolmanager(self, connections, maxsize, block=False):
        self.poolmanager = PoolManager(num_pools=connections,
                                       maxsize=maxsize,
                                       block=block,
                                       ssl_version=self.ssl_version)

　　修改后的上述出錯的代碼：

import urllib.request
from bs4 import BeautifulSoup
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import ssl

class MyAdapter(HTTPAdapter):
    def init_poolmanager(self, connections, maxsize, block=False):
        self.poolmanager = PoolManager(num_pools=connections,
                                       maxsize=maxsize,
                                       block=block,
                                       ssl_version=ssl.PROTOCOL_TLSv1)

s=requests.Session()
s.mount('https://', MyAdapter())
response = urllib.request.urlopen("http://php.net/")
html = response.read()
soup=BeautifulSoup(html, "html5lib")
text=soup.get_text(strip=True)
print(text)

　　可以正常抓取網頁文本信息。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python requests.exceptions.SSLError: EOF occurred in violation of protocol 豆瓣 URLError: python運行報錯：urllib2.URLError: 20200320 - 解決 you-get 提示 urllib.error.URLError: Python3 解決pip報ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598) appium+ python urllib2.URLError: python之urllib.request.urlopen(url)報錯urllib.error.HTTPError: HTTP Error 403: Forbidden處理及引申瀏覽器User Agent處理 Python3.7中urllib.urlopen 報錯問題 There was a problem confirming the ssl certificate: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661) - skipping