Python3+Requests-HTML+Requests-File解析本地html文件

本文轉載自查看原文 2019-01-07 15:40 2921 Python

一、說明

解析html文件我喜歡用xpath不喜歡用BeautifulSoup，Requests的作者出了Requests-HTML后一般都用Requests-HTML。

但是Requests-HTML一開始就是針對Requests從網絡請求頁面計的，並不能解析本地html文件。

想用Requests-HTML解析本地html文件，我們可借助Requests-File庫實現。

二、實現解析本地html文件

2.1 安裝Requests-File

pip install requests-file

2.2 實現代碼

mount方法類似掛載文件系統，但我不清楚其本質是將哪里掛到哪里，測試時使用相對（當前工作目錄）路徑找不到文件使用絕對路徑可以，所以就使用了絕對路徑並未深究。

import os
from requests_html import HTMLSession
from requests_file import FileAdapter

session = HTMLSession()

# 如果是網絡文件此時即可直接請求
# session.get("https://www.baidu.com")

# 如果是本地文件，需要以下代碼
# 掛載文件
session.mount('file://', FileAdapter())
# Windows系統路徑目錄分隔符為反斜杠，但get需要正斜杠所以先進行一下替換
pwd = os.getcwd().replace("\\","/")
# 測試發現使用相對路徑讀不到文件，需要使用絕對路徑
html_obj = session.get(f'file:///{pwd}/want_to_parse.html')

參考：

https://github.com/dashea/requests-file#requests-file

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python requests-HTML使用 Python上requests_html的HTMLSession 使用Python的Requests-HTML庫進行網頁解析 python解析本地HTML文件 025 python爬蟲 requests-html Python 爬蟲實戰（二）：使用 requests-html requests-html requests_html 報錯 requests-html簡介 requests-html的基本使用