Python上requests_html的HTMLSession

本文轉載自查看原文 2020-07-29 01:35 1384 Python

1. 建立Session：

from requests_html import HTMLSession
session = HTMLSession()

2. 打開Url檢查返回碼

mainPage = session.get("https://www.cnblogs.com/chengguo/")
if (mainPage.status_code == 404):
print("url open failed: {}".format(mainPage.url))
sys.exit()

3. 查找內容並檢查返回內容

articleElement = mainPage.html.find("#mainBox > main > div.article-list", first=True)
if (articleElement == None):
print("article empty");

4. 獲取Element內容中的信息(文本/鏈接)

print(articleElement.text)
for url in articleElement.links:

5. 保存網頁元素text

file = open("output.text", "w", encoding="utf-8")
file.write(articleElement.text)
file.close()

6. 保存網頁內容bin

file = open("output.html", "wb")
file.write(mainPage.html.raw_html)
file.close()

--------------------------------------------------------------------------------------------------------

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 requests_html 報錯解決使用requests_html模塊，req.html.render()下載chromium速度慢問題 025 python爬蟲 requests-html 5.Python使用最新爬蟲工具requests-html 使用Python的Requests-HTML庫進行網頁解析 requests-html的安裝與使用 requests-html模塊(下) requests-html添加header python接口自動化28-requests-html爬蟲框架 python接口自動化29-requests-html支持JavaScript渲染頁面