puppeteer安裝以及遇到的坑
1. 環境和安裝
Puppeteer 至少需要 Node v6.4.0,如要使用 async / await,只有 Node v7.6.0 或更高版本才支持。 node下載地址: https://nodejs.org/zh-cn/
2. 創建項目
2.1 創建test目錄,進入目錄執行npm init,生成項目package.json文件
2.2 安裝 puppeteer
yarn add puppeteer 或者 npm i puppeteer
在安裝的過程中遇到如下錯誤
weifandeMacBook-Pro:example weifan$ npm i puppeteer --save > puppeteer@1.6.0 install /Users/weifan/Desktop/example/node_modules/puppeteer > node install.js ERROR: Failed to download Chromium r571375! Set "PUPPETEER_SKIP_CHROMIUM_DOWNLOAD" env variable to skip download. { Error: connect ETIMEDOUT 172.217.25.16:443 at Object._errnoException (util.js:999:13) at _exceptionWithHostPort (util.js:1020:20) at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1207:14) errno: 'ETIMEDOUT', code: 'ETIMEDOUT', syscall: 'connect', address: '172.217.25.16', port: 443 } npm WARN example@1.0.0 No description npm WARN example@1.0.0 No repository field. npm ERR! code ELIFECYCLE npm ERR! errno 1 npm ERR! puppeteer@1.6.0 install: `node install.js` npm ERR! Exit status 1 npm ERR! npm ERR! Failed at the puppeteer@1.6.0 install script. npm ERR! This is probably not a problem with npm. There is likely additional logging output above. npm ERR! A complete log of this run can be found in: npm ERR! /Users/weifan/.npm/_logs/2018-07-16T09_49_23_441Z-debug.log
報錯的原因是:因為在執行安裝的過程中需要執行install.js,這里會下載Chromium,我們這里先跳過進行跳過,
看來需要設置PUPPETEER_SKIP_CHROMIUM_DOWNLOAD,這個環境變量了,設置方法有多種,這里如下:
env PUPPETEER_SKIP_CHROMIUM_DOWNLOAD="true" npm i --save puppeteer
你會看到安裝成功
2.3 手動下載Chromium
下載地址:https://download-chromium.appspot.com/
把下載剛剛下載的文件解壓到項目的chromium文件夾下,在chromium文件夾下你會看到chrome-mac文件,你可以點擊愛看下問價內容。
2.4 在項目的根目錄的src文件夾下新建index.js(截圖功能), 代碼如下:
const puppeteer = require('puppeteer'); async function getPic() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://google.com'); await page.screenshot({path: 'google.png'}); await browser.close(); } getPic();
運行代碼:node index.js,出現了如下錯誤
(node:38213) UnhandledPromiseRejectionWarning: Error: Chromium revision is not downloaded. Run "npm install" or "yarn install" at assert (/Users/weifan/Desktop/example/node_modules/puppeteer/lib/helper.js:282:11) at Function.launch (/Users/weifan/Desktop/example/node_modules/puppeteer/lib/Launcher.js:106:7) at <anonymous>
顯示chromium 未下載錯誤,因為chromium默認的下載路徑是在node_modules/puppeteer/.local-chromium/目錄,這時候我們的chromium是在項目根目錄,所以需要配置指定路徑,修改index.js文件:
const puppeteer = require('puppeteer'); async function getPic() { const browser = await puppeteer.launch({ executablePath: '../chromium/chrome-mac/Chromium.app', headless: false }); const page = await browser.newPage(); await page.goto('https://google.com'); await page.screenshot({path: 'google.png'}); await browser.close(); } getPic();
再次運行index.js,又報如下錯誤:
(node:38246) UnhandledPromiseRejectionWarning: Error: spawn EACCES
在puppeteer的Git issues找到如下解決方法,https://github.com/GoogleChrome/puppeteer/issues/1649,把executablePath改為如下:
executablePath: '../chromium/chrome-mac/Chromium.app/Contents/MacOS/Chromium',
再次node index.js 運行文件,可以跑通了。
參考如下:
1、https://www.jianshu.com/p/a89d8d6c007b
2、https://blog.fundebug.com/2017/11/01/guide-to-automating-scraping-the-web-with-js/
3、https://github.com/GoogleChrome/puppeteer/issues/1649