解決xpath中文亂碼

本文轉載自查看原文 2019-08-19 01:04 1416

利用xpath建標簽樹以后，雖然提高了元素匹配效率，但是etree會把中文轉為ASCII碼，所以簡單地tostring以后會有亂碼。

解決方法：

import requests
from requests.exceptions import RequestException
from lxml import etree

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15',
}


def get_one_page(url, headers):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            response.encoding = response.apparent_encoding
            return response.text
        return None
    except RequestException:
        return None


tree = etree.HTML(html)
aim = tree.xpath(exp)
for i in aim:
    content = etree.tostring(i, encoding='utf-8', pretty_print=True, method="html").decode('utf-8')

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 xpath提取到的中文亂碼時的解決辦法解決git中文亂碼解決MySQL中文亂碼解決Eclipse中文亂碼 kali 中文亂碼解決 vim 中文亂碼解決解決zabbix中文亂碼 JMeter中文亂碼的解決 lxml 中文亂碼解決 RestTemplate 中文亂碼解決