1.第一步,代碼如下:
from requests_html import HTMLSession
url="https://www.baidu.com/"
headers={
"Host": "www.baidu.com",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
}
session=HTMLSession()
req=session.get(url,headers=headers)
req.encoding="utf-8"
req.html.render()
result=req.html.find("a.mnav",first=True)
print(req.status_code)
print(result.text)
print(result.attrs.get('href'))
2.因為是第一次使用render函數,需要安裝chromium,無奈速度太慢,等待幾分鍾,才2%
3.解決步驟如下:
3.1手動下載chromium
https://npm.taobao.org/mirrors/chromium-browser-snapshots/Win_x64/650583/
下載后之后解壓。
3.2 requests_html運行chromium的路徑究竟是怎么樣的?
3.2.1 進入python安裝目錄下的\Lib\site-packages\pyppeteer目錄
筆者的目錄是:C:\Users\Ray\AppData\Local\Programs\Python\Python37\Lib\site-packages\pyppeteer
3.2.2 打開chromium_downloader.py文件
找到代碼:
chromiumExecutable = {
'linux': DOWNLOADS_FOLDER / REVISION / 'chrome-linux' / 'chrome',
'mac': (DOWNLOADS_FOLDER / REVISION / 'chrome-mac' / 'Chromium.app' /
'Contents' / 'MacOS' / 'Chromium'),
'win32': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
'win64': DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
}
從上面可以看出,win64(筆者的win10 系統是64位的)的chromium路徑是:
DOWNLOADS_FOLDER / REVISION / 'chrome-win32' / 'chrome.exe',
那么,DOWNLOADS_FOLDER 和REVISION究竟是什么?
往上面尋找,可以找到以下代碼:
DOWNLOADS_FOLDER = Path(__pyppeteer_home__) / 'local-chromium'
REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', __chromium_revision__)
可以使用print函數打印出兩個路徑,具體代碼如下:
from pyppeteer import __chromium_revision__, __pyppeteer_home__
DOWNLOADS_FOLDER = Path(__pyppeteer_home__) / 'local-chromium'
REVISION = os.environ.get('PYPPETEER_CHROMIUM_REVISION', __chromium_revision__)
print(DOWNLOADS_FOLDER)
print(REVISION)
運行py文件,就可以知道兩個變量的路徑。
由上面可以知道:chromium路徑是:C:\Users\Ray\AppData\Local\pyppeteer\pyppeteer\local-chromium\575458\chrome-win32\chrome.exe
所以自己建文件夾,然后一直到chrome-win32文件夾,把上面下載的chromium文件,拷貝到此目錄下
4.運行第一步的代碼,完美打印。
具體靈感來源:https://github.com/GoogleChrome/puppeteer/issues/1597