python中pdf文件解析包pdfplumber的简单使用

本文转载自查看原文 2021-11-19 19:15 3206 python

pdfplumber不仅可以解析提取pdf文件中的文本，还可以提取表格

一、安装

pip3 install pdfplumber

二、使用

# coding:utf-8

import pdfplumber

with pdfplumber.open('./test.pdf') as pdf:
    # 遍历每个页面
    for page in pdf.pages:
        # 获取当前页面的全部文本信息，包括表格中的文字,没有内容则打印None
        print(page.extract_text())
        # 提取当前页面中的所有表格
        print(page.extract_tables())   #没有表格，则返回[]，有表格则返回[[[row1],[row2]...],[[row1],[row2]...]...]
        # 遍历提取到的每个表
        for table in page.extract_tables():
            print(table) # [[row1],[row2]...]
            # 遍历每一行数据
            for row in table:
                print(row) # ['xxx','xxx'...]
        # 每一页打印一条分割线
        print('---------- 分割线 ----------')

# test.pdf是需要解析的pdf文件

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 使用pdfplumber读取PDF Python：解析PDF文本及表格——pdfminer、tabula、pdfplumber 的用法及对比 Python：解析PDF文本及表格——pdfminer、tabula、pdfplumber 的用法及对比 pdfplumber库解析pdf格式 Python - PDF 识别文字 (pdfplumber) Python库PyPDF2和pdfplumber操作PDF 【Python 库】解析PDF文本及表格——pdfminer、tabula、pdfplumber 的用法及对比【转】python之pdfplumber读取拆分pdf内容和表格【python】操作PDF全总结|pdfplumber&PyPDF2 python3使用pdfminer3k解析pdf文件