來源:https://www.cnblogs.com/xiaoaiyiwan/p/10776493.html 稍作修改
1.第一步,代碼如下:
from requests_html import HTMLSession
url="https://www.baidu.com/"
headers={
"Host": "www.baidu.com",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
}
session=HTMLSession()
req=session.get(url,headers=headers)
req.encoding="utf-8"
req.html.render()
result=req.html.find("a.mnav",first=True)
print(req.status_code)
print(result.text)
print(result.attrs.get('href'))
2.因為是第一次使用render函數,需要安裝chromium,無奈速度太慢,等待幾分鍾,才2%
因為各種不可知的原因,下載時還有可能報錯 我的報錯是
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /chromium-browser-snapshots/Win_x64/575458/chrome-win32.zip (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))
3.解決步驟如下:
3.1手動下載chromium
https://npm.taobao.org/mirrors/chromium-browser-snapshots/Win_x64/650583/
下載后之后解壓。
3.2 requests_html運行chromium的路徑究竟是怎么樣的?
3.2.1 進入python安裝目錄下的\Lib\site-packages\pyppeteer目錄
筆者的目錄是:C:\Users\Ray\AppData\Local\Programs\Python\Python37\Lib\site-packages\pyppeteer
3.2.2 打開chromium_downloader.py文件
找到代碼:
chromiumExecutable = {
'linux': DOWNLOADS_FOLDER / REVISION / 'chrome-linux' / 'chrome',
'mac': (DOWNLOADS_FOLDER / REVISION / 'chrome-mac' / 'Chromium.app' /
'Contents' / 'MacOS' / 'Chromium'),
'win32': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
'win64': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
}
從上面可以看出,win64(筆者的win10 系統是64位的)的chromium路徑是:
DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
那么,DOWNLOADS_FOLDER 和REVISION究竟是什么?
往上面尋找,可以找到以下代碼:
DOWNLOADS_FOLDER = Path(pyppeteer_home) / 'local-chromium'
REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', chromium_revision)
可以使用print函數打印出兩個路徑,具體代碼如下:
from pyppeteer import chromium_revision, pyppeteer_home
DOWNLOADS_FOLDER = Path(pyppeteer_home) / 'local-chromium'
REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', chromium_revision)
print(DOWNLOADS_FOLDER)
print(REVISION)
直接運行這個py文件,也可以拷貝部分代碼去自己的py文件中運行就可以知道兩個變量的路徑。
我的代碼如下
import os
from pathlib import Path
from pyppeteer import chromium_revision, pyppeteer_home
DOWNLOADS_FOLDER = Path(pyppeteer_home) / 'local-chromium'
REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', chromium_revision)
chromiumExecutable = {
'linux': DOWNLOADS_FOLDER / REVISION / 'chrome-linux' / 'chrome',
'mac': (DOWNLOADS_FOLDER / REVISION / 'chrome-mac' / 'Chromium.app' /
'Contents' / 'MacOS' / 'Chromium'),
'win32': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
'win64': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
}
print(chromiumExecutable['win64'])
這樣可以直接找到安裝的路徑。
由上面可以知道:chromium路徑是:C:\Users\Ray\AppData\Local\pyppeteer\pyppeteer\local-chromium\575458\chrome-win32\chrome.exe
所以自己建文件夾,然后一直到chrome-win32文件夾,把上面下載的chromium文件,拷貝到此目錄下,下載到的Chromeium是一個壓縮包,解壓后把全部文件拷貝到路徑就可以了。
4.運行第一步的代碼,完美打印。
具體靈感來源:https://github.com/GoogleChrome/puppeteer/issues/1597