Python上requests_html的HTMLSession

本文转载自查看原文 2020-07-29 01:35 1384 Python

1. 建立Session：

from requests_html import HTMLSession
session = HTMLSession()

2. 打开Url检查返回码

mainPage = session.get("https://www.cnblogs.com/chengguo/")
if (mainPage.status_code == 404):
print("url open failed: {}".format(mainPage.url))
sys.exit()

3. 查找内容并检查返回内容

articleElement = mainPage.html.find("#mainBox > main > div.article-list", first=True)
if (articleElement == None):
print("article empty");

4. 获取Element内容中的信息(文本/链接)

print(articleElement.text)
for url in articleElement.links:

5. 保存网页元素text

file = open("output.text", "w", encoding="utf-8")
file.write(articleElement.text)
file.close()

6. 保存网页内容bin

file = open("output.html", "wb")
file.write(mainPage.html.raw_html)
file.close()

--------------------------------------------------------------------------------------------------------

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 requests_html 报错 python3 requests_html 爬取智联招聘数据（简易版）解决使用requests_html模块，req.html.render()下载chromium速度慢问题解决使用requests_html模块,html.render()下载chromium报错、速度慢问题 Python requests-HTML使用 Python3+Requests-HTML+Requests-File解析本地html文件 025 python爬虫 requests-html Python 爬虫实战（二）：使用 requests-html 5.Python使用最新爬虫工具requests-html 使用Python的Requests-HTML库进行网页解析