1. 建立Session:
from requests_html import HTMLSession
session = HTMLSession()
2. 打開Url檢查返回碼
mainPage = session.get("https://www.cnblogs.com/chengguo/")
if (mainPage.status_code == 404):
print("url open failed: {}".format(mainPage.url))
sys.exit()
3. 查找內容並檢查返回內容
articleElement = mainPage.html.find("#mainBox > main > div.article-list", first=True)
if (articleElement == None):
print("article empty");
4. 獲取Element內容中的信息(文本/鏈接)
print(articleElement.text)
for url in articleElement.links:
5. 保存網頁元素text
file = open("output.text", "w", encoding="utf-8")
file.write(articleElement.text)
file.close()
6. 保存網頁內容bin
file = open("output.html", "wb")
file.write(mainPage.html.raw_html)
file.close()
--------------------------------------------------------------------------------------------------------