python操作PDF------提取PDF文字內容

本文轉載自查看原文 2020-09-04 19:11 1951

# 安裝  pip install pdfplumber
import pdfplumber

# 利用pdfplumber提取文字
with pdfplumber.open('基於python的網頁爬蟲.pdf') as pdf:
    first_page = pdf.pages[0]
    print(first_page.extract_text())


# 利用pdfplumber單個提取表格
with pdfplumber.open('基於python的網頁爬蟲.pdf') as pdf:
    first_page = pdf.pages[0]
    print(first_page.extract_table())


# 利用pdfplumber多個提取表格
with pdfplumber.open('基於python的網頁爬蟲.pdf') as pdf:
    first_page = pdf.pages[0]
    for table in first_page.extract_tables():
        print(table)


# 利用pdfplumber單個提取財報  table_settings: 提取表格是的設定
with pdfplumber.open('基於python的網頁爬蟲.pdf') as pdf:
    first_page = pdf.pages[0]
    table = first_page.extract_tables(
        table_settings={
            'vertical_strategy': 'text',
            'horizontal_strategy': 'text'
        }
    )
    new_table = []
    for row in table:
        new_row = []
        # 如果不是空行
        if not ''.join([str(item) for item in row]) == '':
            # 合並單詞
            new_row.append(''.join([str(item) if item else '' for item in row[:3]]))
            new_row += row[3:]
            new_table.append(new_row)
    print(new_table)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python操作PDF------旋轉及排序pdf python 提取pdf文字 python操作PDF------加水印及加密解密 python如何提取pdf文件圖片中的文字？ Python實現PDF轉Words（文字提取） pdfminert提取PDF中文內容從pdf中提取內容的方法使用itextpdf提取pdf內容 PDF電子發票內容提取 php抓取圖片進行內容提取解析，文字性pdf進行內容文字提取解析