Playwright的使用


1. 基本使用

  1. 同步模式
from playwright.sync_api import sync_playwright

url = 'https://www.baidu.com'

with sync_playwright() as p:
    for browser_type in [p.chromium, p.firefox, p.webkit]:
        browser = browser_type.launch(headless=False)
        page = browser.new_page()
        page.goto(url)
        page.screenshot(path=f'sync-{browser_type.name}.png')
        print(page.title())
        browser.close()
  1. 異步模式
import asyncio
from playwright.async_api import async_playwright

url = 'https://www.baidu.com'

async def main():
    async with async_playwright() as p:
        for browser_type in [p.chromium, p.firefox, p.webkit]:
            browser = await browser_type.launch()
            page = await browser.new_page()
            await page.goto(url)
            await page.screenshot(path=f'async-{browser_type.name}.png')
            print(await page.title())
            await browser.close()
asyncio.run(main())

2. 代碼生成

Playwright可以錄制在瀏覽器的操作並自動生成代碼。codegen

# 查看codegen命令的參數
playwright codegen --help

# 例如:啟動firefox瀏覽器,並將操作結果輸出到script.py文件
playwright codegen -o script.py -b firefox https://www.baidu.com

3. 選擇器

  1. 文本選擇
page.click("text=Log in")
  1. CSS選擇器
page.click("button")
page.click("#nav-bar .contact-us-item")
page.click("[data-test=login-button]")
page.click("[aria-label='Sign in']")
  1. XPath
# 需在開頭自行指定 “xpath=字符串”
page.click("xpath=//button")

4. 事件監聽

​ page對象提供一個on方法,用來監聽頁面中發生的各個事件,例如close, console, load, request, response等。

對於Ajax加載的數據,即使這個Ajax請求中有加密參數,也不用擔心,因為我們截獲的是最后的響應結果

from playwright.sync_api import Playwright, sync_playwright


# def on_response(response):
#     """
#     輸出瀏覽器Network面板中的所有請求和相應
#     """
#     print(f'Status {response.status}: {response.url}')


def on_response(response):
    """
    通過on_response方法攔截Ajax請求,直接獲取響應結果。
    """
    if "api/movie/" in response.url and response.status == 200:
        print(response.json())


def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    page = browser.new_page()
    # 監聽response事件,同時將回調方法設為on_response
    page.on('response', on_response)
    page.goto("https://spa6.scrape.center/")
    page.wait_for_load_state("networkidle")
    page.close()
    browser.close()


with sync_playwright() as playwright:
    run(playwright)

5. 常用方法

  1. 獲取網頁源碼:page.content()

  2. 頁面點擊:page.click(selector, kwargs) 參考官方文檔

  3. 文本輸入:page.fill(selector, value, kwargs)

  4. 獲取節點屬性:page.get_attribute(selector, name, kwargs)

    # 只返回單個節點屬性
    href = page.get_attribute("a.name", "href")
    
  5. 獲取多個節點:query_selector_all()

    1. 節點屬性:element.get_attribute(name)
    2. 節點文本:element.text_content()
    elements = page.query_selector_all("a.name")
    for element in elements:
      href = element.get_attribute("href")
      text = element.text_content()
    
  6. 獲取單個節點:query_selector()

    element = page.query_selector("a.name")
    href = element.get_attribute("href")
    text = element.text_content()
    


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM