在玩爬蟲的時候,針對https ,需要單獨處理。不然就會報錯:
解決辦法:引入 ssl 模塊即可
核心代碼
imort ssl
ssl._create_default_https_context = ssl._create_unverified_context
完整代碼如下:
# coding=utf-8 import re import urllib.request import ssl # 獲取html內容 def getHtml(url): page = urllib.request.urlopen(url) html = page.read() html = html.decode('utf-8') return html # 獲取title def get_title(html): reg = r'<title>(.*)</title>' content_title = re.compile(reg) result = re.findall(content_title, html) return result # 創建ssl證書 ssl._create_default_https_context = ssl._create_unverified_context url = "https://www.cnblogs.com" html = getHtml(url) title = get_title(html) print(title)
結果: