puppeteer生態一覽

本文轉載自查看原文 2020-11-16 16:07 366

title = "puppeteer生態一覽"
description = ""
author = ""
tags = []

puppeteer是chrome官方出品的無界面瀏覽器，我們一般稱為無頭瀏覽器。

這種瀏覽器具有普通版瀏覽器的完備功能，並且可以運行在無界面的服務端，比如遠程的linux服務器上，是做ui自動化測試的一個不錯的選擇。

我們今天就來看一下puppeteer的生態，看看除了自動化測試這個工具還能做什么吧。

Puppetron

https://github.com/cheeaun/puppetron，這個項目的功能非常簡單，就是使用puppeteer來渲染頁面並且進行截圖或者是將頁面保存成pdf文件。

你可以通過api調用來完成上述的3個功能，渲染/截圖以及保存pdf都支持寬高設置，所以你可以很容易的實現模擬不同大小屏幕的功能。

比如你可以通過設置不同的寬高來得到某個網站在小屏手機/大屏手機/平板以及電腦上的屏幕截圖。

pupperender

https://github.com/LasaleFamine/pupperender，pupperender是一個express中間件，它的功能是如果探測到這次頁面的訪問是來自搜索引擎的機器人或者是爬蟲，那么就自動使用puppeteer來渲染頁面，返回完整的html。

該工具就解決了一些前后端分離的站點對搜索引擎不友好的問題，因為如果是純js實現的前端頁面，那么搜索機器人爬到的頁面僅僅包含一些基本的html，頁面的具體內容是沒辦法正確渲染出來並返回的。

該庫的使用也非常的簡單，幾行代碼就可以搞定了。

const express = require('express');
const pupperender = require('pupperender');

const app = express();

app.use(pupperender.makeMiddleware({}));

app.use(express.static('files'));
app.listen(8080);

有困惑的同學可以搜索PWA進行更深入的學習。

headless-chrome-crawler

https://github.com/yujiosaka/headless-chrome-crawler，是使用puppeteer實現的爬蟲。

它的一些功能還是非常有特色的，比如

支持分布式爬取
實現了深度優先和廣度優先算法
支持csv和json line格式導出
插件式的結果存儲，比如支持redis
自動插入jquery，可以使用jquery語法進行結果處理
支持截圖作為爬取證據
支持模擬不同的設備

簡單看一個例子

const HCCrawler = require('headless-chrome-crawler');

(async () => {
  const crawler = await HCCrawler.launch({
    // Function to be evaluated in browsers
    evaluatePage: (() => ({
      title: $('title').text(),
    })),
    // Function to be called with evaluated results from browsers
    onSuccess: (result => {
      console.log(result);
    }),
  });
  // Queue a request
  await crawler.queue('https://example.com/');
  // Queue multiple requests
  await crawler.queue(['https://example.net/', 'https://example.org/']);
  // Queue a request with custom options
  await crawler.queue({
    url: 'https://example.com/',
    // Emulate a tablet device
    device: 'Nexus 7',
    // Enable screenshot by passing options
    screenshot: {
      path: './tmp/example-com.png'
    },
  });
  await crawler.onIdle(); // Resolved when no queue is left
  await crawler.close(); // Close the crawler
})();

上面的例子就演示了使用jquery語法title: $('title').text() 獲取頁面信息以及模擬特定的設備Nexus 7進行爬取以及截圖的功能。

browserless

https://github.com/browserless/chrome。 browserless是一個雲服務，它允許遠程客戶端連接並控制在服務器上的無界面瀏覽器，而這一切都是跑在docker里的。

你可以使用browserless在自己公司內部搭建這樣一套服務，這樣就可以實現自己的headless瀏覽器私有雲服務，組織內的任何成員都可以通過腳本去使用瀏覽器，統一了自動化執行環境並且優化了資源利用率。

browserless的原理如下，有興趣的同學可以仔細研究一下。

browserless listens for both incoming websocket requests, generally issued by most libraries, as well as pre-build REST APIs to do common functions (PDF generation, images and so on). When a websocket connects to browserless it invokes Chrome and proxies your request into it. Once the session is done then it closes and awaits for more connections. Some libraries use Chrome's HTTP endpoints, like /json to inspect debug-able targets, which browserless also supports.

Your application still runs the script itself (much like a database interaction), which gives you total control over what library you want to choose and when to do upgrades. This is preferable over other solutions as Chrome is still breaking their debugging protocol quite frequently.

puppeteersandbox

https://puppeteersandbox.com，這個站點可以是一個puppeteer的沙盒環境，可以很方便的嵌入到其他的站點或者是markdown文件里。

不過不知道是不是因為網絡的原因，這個站點似乎從來就沒工作過，好吧，希望早日恢復健康。

jest-puppeteer

https://github.com/smooth-code/jest-puppeteer，是一個幾乎只需要零配置的基於puppeteer和jest的ui自動化測試框架。

稍微看一下用例的寫法，相信你很快就可以明白這個框架的用處了。

import 'expect-puppeteer'

describe('Google', () => {
  beforeAll(async () => {
    await page.goto('https://google.com')
  })

  it('should display "google" text on page', async () => {
    await expect(page).toMatch('google')
  })
})

// Assert that current page contains 'Text in the page'
await expect(page).toMatch('Text in the page')

// Assert that a button containing text "Home" will be clicked
await expect(page).toClick('button', { text: 'Home' })

// Assert that a form will be filled
await expect(page).toFillForm('form[name="myForm"]', {
  firstName: 'James',
  lastName: 'Bond',
})

可以很方便的斷言頁面的文字包含，點擊以及表單的填寫，加上jest靈活的定制能力，該框架的靈活性是值得期待的。

Puppetry

https://puppetry.app 是一款完成度很高的用例錄制及執行回放工具，當然了，基於puppeteer。

puppetry開源，跨平台，核心特性很多，這里就不一一列舉了。有興趣的同學可以看一下項目的github主頁。

需要重點說明的是，Puppetry官方表示支持的測試場景有下面這么多

功能測試
測試動態內容
詳細測試
性能測試
可視化的回歸測試
mock api的能力
測試rest api
測試google analytics代碼
測試chrome擴展
測試shadow dom
測試郵件

總結

總的來說，開發者對於puppeteer的熱情還是很高的，而且puppeteer的更新頻率很快，整體生態環境是健康和諧的。

對於大多數同學來說，只需要簡單的理解，puppeteer可以實現

爬蟲的能力
ui自動化測試的能力

就可以了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ipconfig命令一覽 wsrep配置一覽技術博客一覽 ExtJs xtype一覽 maven 骨架一覽 TCP/IP面試一覽 Hadoop 的常用組件一覽常見ETL工具一覽 React Diff算法一覽 Matlab繪圖函數一覽