Traceback (most recent call last): File "/Users/*******.py", line 37, in <module> BtcSpider().run() File "/Users/******.py", line 34, in run self.parse_data(data) File "/Users/******.py", line 21, in parse_data xpath_data = etree.HTML(data) File "src/lxml/etree.pyx", line 3161, in lxml.etree.HTML File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
爬了一個論壇,網頁是<meta http-equiv="Content-Type" content="text/html; charset=gb2312"> 但是Mac爬取的網頁utf-8解碼才正確,但是在 xpath 解析的時候出現上面問題,
xpath 解析的時候 encode 一下就可以了,看代碼:
xpath_data = etree.HTML(data.encode('utf-8'))
問題解決啦