Python用python-docx讀寫word文檔

本文轉載自查看原文 2019-08-25 12:49 8854 Python

python-docx庫可用於創建和編輯Microsoft Word（.docx）文件。
官方文檔：https://python-docx.readthedocs.io/en/latest/index.html

備注：
doc是微軟的專有的文件格式，docx是Microsoft Office2007之后版本使用，其基於Office Open XML標准的壓縮文件格式，比
doc文件所占用空間更小。docx格式的文件本質上是一個ZIP文件，所以其實也可以把.docx文件直接改成.zip，解壓后，里面的
word/document.xml包含了Word文檔的大部分內容，圖片文件則保存在word/media里面。
python-docx不支持.doc文件，間接解決方法是在代碼里面先把.doc轉為.docx。

一、安裝包

pip3 install python-docx

二、創建word文檔

下面是在官文示例基礎上對個別地方稍微修改，並加上函數的使用說明

from docx import Document
from docx.shared import Inches

document = Document()

#添加標題，並設置級別，范圍：0 至 9，默認為1
document.add_heading('Document Title', 0)

#添加段落，文本可以包含制表符（\t）、換行符（\n）或回車符（\r）等
p = document.add_paragraph('A plain paragraph having some ')
#在段落后面追加文本，並可設置樣式
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')

#添加項目列表（前面一個小圓點）
document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph('second item in unordered list', style='List Bullet')

#添加項目列表（前面數字）
document.add_paragraph('first item in ordered list', style='List Number')
document.add_paragraph('second item in ordered list', style='List Number')

#添加圖片
document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)

#添加表格：一行三列
# 表格樣式參數可選：
# Normal Table
# Table Grid
# Light Shading、 Light Shading Accent 1 至 Light Shading Accent 6
# Light List、Light List Accent 1 至 Light List Accent 6
# Light Grid、Light Grid Accent 1 至 Light Grid Accent 6
# 太多了其它省略...
table = document.add_table(rows=1, cols=3, style='Light Shading Accent 2')
#獲取第一行的單元格列表
hdr_cells = table.rows[0].cells
#下面三行設置上面第一行的三個單元格的文本值
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
    #表格添加行，並返回行所在的單元格列表
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

#保存.docx文檔
document.save('demo.docx')

創建的demo.docx內容如下：

三、讀取word文檔

from docx import Document

doc = Document('demo.docx')

#每一段的內容
for para in doc.paragraphs:
    print(para.text)

#每一段的編號、內容
for i in range(len(doc.paragraphs)):
    print(str(i),  doc.paragraphs[i].text)

#表格
tbs = doc.tables
for tb in tbs:
    #行
    for row in tb.rows:    
        #列    
        for cell in row.cells:
            print(cell.text)
            #也可以用下面方法
            '''text = ''
            for p in cell.paragraphs:
                text += p.text
            print(text)'''

運行結果：

Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
second item in unordered list
first item in ordered list
second item in ordered list



0 Document Title
1 A plain paragraph having some bold and some italic.
2 Heading, level 1
3 Intense quote
4 first item in unordered list
5 second item in unordered list
6 first item in ordered list
7 second item in ordered list
8 
9 

Qty
Id
Desc
3
101
Spam
7
422
Eggs
4
631
Spam, spam, eggs, and spam
[Finished in 0.2s]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 $用python-docx模塊讀寫word文檔 python-docx template 操作word文檔使用python-docx生成Word文檔 [python-docx]docx文檔操作的庫 python使用python-docx導出word Python-docx對EXCEL、Word的操作使用python-docx處理word.docx文件（1）使用python-docx處理word.docx文件（2） python-docx python-docx中文