pandas read_html 報錯： no tables found

本文轉載自查看原文 2021-02-19 19:58 833 pandas

pandas是個好東西，相信不少人都接觸過，我也是一年前老師教授時，我跟着粗淺的學過。它對數據超快的加載速度，輕松地多樣的處理函數，讓人愛不釋手。也是最近一個月的時候才突然發現pandas

居然可以直接獲取目標網頁表格(驚喜到了)，以前都是習慣使用類似requests+xpath+lxml的方式來定位獲取管興趣的數據。而pd.read_html的使用能精簡代碼，處理也方便，簡直不要太爽。好了，廢話了一堆，

記錄哈子今天碰見的問題吧。

1.問題

我感興趣的頁面出現了tables(靜態頁面)，於是我便使用了pd.read_html(),意外地出現了報錯： no tables found

2.解決方案

1.1 添加定位元素

pd.read_html(url,attr={'':''})

好家伙，到這我就發現了問題，這個table標簽里沒有name，class，id等常見屬性，於是我便定位到它的父級容器div

pd.read_html(url,attr={'class':'table_xxx'})

遺憾的是依舊找不到 table

然后更改到table標簽的布局屬性還是同上。

1.2

回頭在看源碼里面怎么說的，首當其沖注意到第一段

io : str or file-like

A URL, a file-like object, or a raw string containing HTML. Note that lxml only accepts the http, ftp and file url protocols. If you have a URL that starts with 'https'you might try removing the 's'.

接收網址、文件、字符串。網址不接受https，嘗試去掉s后使用

結果：失敗

1.3 指定header 和添加解碼格式

注意：這里的header 不是headers，它是指標題所在的行

此處的掙扎是徒勞的，就不打代碼了

1.4 ‘曲線救國’

我想既然有io 參數，那咋們就不在線尋找了，先獲取源碼，再解析，這樣也是可以的。

使用selenium
    url='xxx'
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    #options.add_argument('--disable-gpu')

    # 初始化
    driver = webdriver.Chrome(options=options)
    # driver.maximize_window()

    # selenium瀏覽器配置大小
    options.add_argument('window-size=1334x750')

    driver.get(url=url)
    html=driver.page_source
    df = pd.read_html(html,header = 0)

使用requests
url='xxxx'

response = requests.get(url=url,headers = headers)

res = response.content.decode

df = pd.read_html(res,header = 0)

然后拿到了table，😼。。。。。

19:59:07

3原因

學疏才淺(打完代碼就把問題寫在了這里，我此刻也不知道😂)，有大佬路過的話，萬望指點一二。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas read_html源碼詳解（二） pandas read_html使用詳解（一） Python學習筆記：Pandas之read_html、to_html函數利用pandas庫中的read_html方法快速抓取網頁中常見的表格型數據 FLUSH TABLES WITH READ LOCK有多快 pandas read_excel 報錯 list index out of range MySQL5.7.11版本，報錯Cannot proceed because system tables used by Event Scheduler were found damaged at server start pandas.read_csv / read_table 【R語言系列】read.table報錯incomplete final line found by readTableHeader pandas（1）：Pandas文件讀取——read_excel()