1. 基本使用
- 同步模式
from playwright.sync_api import sync_playwright
url = 'https://www.baidu.com'
with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch(headless=False)
page = browser.new_page()
page.goto(url)
page.screenshot(path=f'sync-{browser_type.name}.png')
print(page.title())
browser.close()
- 異步模式
import asyncio
from playwright.async_api import async_playwright
url = 'https://www.baidu.com'
async def main():
async with async_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = await browser_type.launch()
page = await browser.new_page()
await page.goto(url)
await page.screenshot(path=f'async-{browser_type.name}.png')
print(await page.title())
await browser.close()
asyncio.run(main())
2. 代碼生成
Playwright可以錄制在瀏覽器的操作並自動生成代碼。 【codegen】
# 查看codegen命令的參數
playwright codegen --help
# 例如:啟動firefox瀏覽器,並將操作結果輸出到script.py文件
playwright codegen -o script.py -b firefox https://www.baidu.com
3. 選擇器
- 文本選擇
page.click("text=Log in")
- CSS選擇器
page.click("button")
page.click("#nav-bar .contact-us-item")
page.click("[data-test=login-button]")
page.click("[aria-label='Sign in']")
- XPath
# 需在開頭自行指定 “xpath=字符串”
page.click("xpath=//button")
4. 事件監聽
page對象提供一個on方法,用來監聽頁面中發生的各個事件,例如close, console, load, request, response等。
對於Ajax加載的數據,即使這個Ajax請求中有加密參數,也不用擔心,因為我們截獲的是最后的響應結果
from playwright.sync_api import Playwright, sync_playwright
# def on_response(response):
# """
# 輸出瀏覽器Network面板中的所有請求和相應
# """
# print(f'Status {response.status}: {response.url}')
def on_response(response):
"""
通過on_response方法攔截Ajax請求,直接獲取響應結果。
"""
if "api/movie/" in response.url and response.status == 200:
print(response.json())
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
# 監聽response事件,同時將回調方法設為on_response
page.on('response', on_response)
page.goto("https://spa6.scrape.center/")
page.wait_for_load_state("networkidle")
page.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)
5. 常用方法
-
獲取網頁源碼:page.content()
-
頁面點擊:page.click(selector, kwargs) 參考官方文檔
-
文本輸入:page.fill(selector, value, kwargs)
-
獲取節點屬性:page.get_attribute(selector, name, kwargs)
# 只返回單個節點屬性 href = page.get_attribute("a.name", "href")
-
獲取多個節點:query_selector_all()
- 節點屬性:element.get_attribute(name)
- 節點文本:element.text_content()
elements = page.query_selector_all("a.name") for element in elements: href = element.get_attribute("href") text = element.text_content()
-
獲取單個節點:query_selector()
element = page.query_selector("a.name") href = element.get_attribute("href") text = element.text_content()