利用python自動寫docx報告

本文轉載自查看原文 2022-03-10 10:24 706 python

作者：謝小玲
鏈接：https://zhuanlan.zhihu.com/p/260205113
來源：知乎
著作權歸作者所有。商業轉載請聯系作者獲得授權，非商業轉載請注明出處。

最近在做一些數據方面的東西。雖然處理 excel 很方便。有時候為了不寫周報,可以用自動化來寫。
比如可以從jira 里面撈數據。比如可以統計excel 里面數據，生成圖表，生成doc, 自動發出來。

最后聽人說Python的docx包不錯，專門對於window下的word進行操作，所以嘗試下對於Python的docx包，只能說功能非常強大，簡單的說，可以用來創建/修改docx文檔，對其標題、段落、表格、圖片等進行處理，粗略掃了下，我的需求基本上都能滿足，只是剩下如何用Python代碼實現了

首先是安裝，用pip安裝即可:

pip install python-docx

Python-docx的官網文檔https://python-docx.readthedocs.io/en/latest/index.html，可以去瞅瞅，有個初步的印象。

Word文檔比較復雜，是二進制文件，所以常規的讀取文件方法是沒用的，所以docx包用不同的文本類型來表示：

最頂層是Document對象，其代表整個文檔
block-level（塊等級），段落是常見的塊等級，換行符結尾算一個段落，表格、圖片和標題均屬於塊對象；對於塊對象屬性，常見有對齊（alignment）、縮進（indent）以及行間距（space）等等
inline-level（內聯等級），其屬於塊等級中，run是常見的內聯等級，一個塊對象可由多個run組成，特別是通過run可由設置不同屬性樣式；文字、句子、段落均可作為內聯對象；對於內聯對象屬性，常見有字體、大小、對齊以及顏色等等

其實，如果用Python從頭寫一個word文檔的話，對docx的包的一些用法的掌握需要熟練點，但函數和功能有點多（當然如果是一個簡潔的word文檔的話，那還是很簡單的）；因此我選擇一個取巧的版本，先制作一個word模板，里面包含一些不需要更改的文章段落、標題以及目錄等，並設置好字體、大小以及表頭（包括表格的樣式）等；這樣的話，我只需要將一些動態變化的文字、圖片以及表格內容填入對應位置即可

以下是相關操作的方法：

首先導入docx包相關函數（有點多），打開模板文件

from docx import Document
from docx.shared import Inches
from docx.shared import Cm
from docx.shared import Pt
from docx.shared import RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.table import WD_ALIGN_VERTICAL
from docx.oxml.ns import qn

document = Document("./模板.docx")

讀取docx文件中的所有段落，paragraphs是一個列表，里面存儲了所有的段落信息；查看某個段落是什么內容，可以用text方法，生成的是str類型，Python中支持字符串操作的方法函數都可對其操作

paragraphs = document.paragraphs
print(paragraphs[10].text)
type(paragraphs[10].text)

插入文字定位可以分為兩種方式，一種是通過指定插入的段落數，比如要在第10段落后面接着加上文字：

paragraphs[10].add_run("XXX")

另外一種通過for循環paragraphs列表，判斷某段落中是否有你的標注信息（定位信息），有的話，就在該段落后面加上文字

for par in document.paragraphs:
    if "[sign]" in par.text:
        par.add_run("XXX")

如果插入圖片，對document對象操作，如:document.add_picture()，那么圖片是默認生成在文檔在最后面；如果想指定圖片插入位置，也類似於上面文字插入方式，用run來操作；可以通過指定width和height參數來設定圖片的大小，可以使用docx.shared.Inches()函數和docx.shared.Cm()函數設置尺寸

run = paragraphs[10].add_run()
run.add_picture("xxx.png", width = Inches(4.5))

對於文字屬性的操作（字體、大小、顏色等等）有點麻煩，如果是對插入的文字的話，可以搭配add_run()方法使用，如：

pa = paragraphs[10].add_run("XXX")
pa.font.size = Pt(10)
pa.font.bold = True
pa.font.color.rgb = RGBColor(255, 0, 0)
pa.font.name = "Times New Roman"

對於一些中文字體上述用font.name方法是無效的，需要使用_element.rPr.rFonts的set()方法：

pa.font.name = u'微軟雅黑'
pa._element.rPr.rFonts.set(qn('w:eastAsia'), u'微軟雅黑')

如果需要多次對文字的屬性進行操作，則最好將上述寫成函數形式：

def paragraph_attribute(pa, size, family, r = 0x00, g = 0x00, b = 0x00, bold = None):
    pa.font.size = Pt(size)
    pa.font.name = family
    if bold == True:
        pa.font.bold = True
    pa.font.color.rgb = RGBColor(r, g, b)
    p = pa._element.rPr.rFonts.set(qn('w:eastAsia'), family)

pa = paragraphs[10].add_run("XXX")
paragraph_attribute(pa)

對於表格的操作，可以選擇用docx包創建一個表格，並設置樣式，然后分別對每行的單元格cell寫入內容

table = document.add_table(rows = 2, cols = 2, style = "Normal Table")
table.cell(0,0).text = "XXX"

但是我會先在模板中自定義一個表格樣式（這樣可以使用自定義樣式，不必要選擇word中可選擇的那些樣式），然后寫好表頭（后續可在代碼中修改表頭，也可不修改），然后用add_row()方法在表格中增加行

tables = document.tables
row_line = tables[0].add_row()
row_line.cell(0,0).txt = "XXX"

表格的屬性相比文字的要求可能會多一點，比如表格行高、列寬、居中等需求，這些都是可以設置的，如下：

tables[0].cell(0,0).width = Cm(3)    #每列必須相同，不相同取最大值
tables[0].rows[0].height = Cm(0.7)
tables[0].cell(0,0).paragraphs[0].paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER   #水平對齊
tables[0].cell(0,0).vertical_alignment = WD_ALIGN_VERTICAL.CENTER     #垂直對齊

對word文檔修改完畢后，保存到指定docx文件即可

document.save("TEST.docx")

但docx包好似沒有對目錄進行操作的方法，比如我想生成自動化報告后，自動對目錄的頁碼進行更新；對於這個需求，可以考慮使用win32com.client包，沒仔細研究過，但是更新目錄操作如下：

import win32com.client
word = win32com.client.DispatchEx("Word.Application")
doc = word.Documents.Open("./TEST.docx")
doc.TablesOfContents(1).Update()
doc.Close(SaveChanges=True)
word.Quit()

二、創建word文檔

下面是在官文示例基礎上對個別地方稍微修改，並加上函數的使用說明

from docx import Document
from docx.shared import Inches

document = Document() #添加標題，並設置級別，范圍：0 至 9，默認為1
document.add_heading('Document Title', 0) #添加段落，文本可以包含制表符（\t）、換行符（\n）或回車符（\r）等
p = document.add_paragraph('A plain paragraph having some ') #在段落后面追加文本，並可設置樣式
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote') #添加項目列表（前面一個小圓點）
document.add_paragraph( 'first item in unordered list', style='List Bullet' )
document.add_paragraph('second item in unordered list', style='List Bullet') #添加項目列表（前面數字）
document.add_paragraph('first item in ordered list', style='List Number')
document.add_paragraph('second item in ordered list', style='List Number') #添加圖片
document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
) #添加表格：一行三列 # 表格樣式參數可選：# Normal Table # Table Grid # Light Shading、 Light Shading Accent 1 至 Light Shading Accent 6 # Light List、Light List Accent 1 至 Light List Accent 6 # Light Grid、Light Grid Accent 1 至 Light Grid Accent 6 # 太多了其它省略...
table = document.add_table(rows=1, cols=3, style='Light Shading Accent 2') #獲取第一行的單元格列表
hdr_cells = table.rows[0].cells #下面三行設置上面第一行的三個單元格的文本值
hdr_cells[0].text = 'Qty' hdr_cells[1].text = 'Id' hdr_cells[2].text = 'Desc'
for qty, id, desc in records: #表格添加行，並返回行所在的單元格列表
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break() #保存.docx文檔
document.save('demo.docx')

三、讀取word文檔

from docx import Document

doc = Document('demo.docx') #每一段的內容
for para in doc.paragraphs: print(para.text) #每一段的編號、內容
for i in range(len(doc.paragraphs)): print(str(i),  doc.paragraphs[i].text) #表格
tbs = doc.tables for tb in tbs: #行
    for row in tb.rows: #列
        for cell in row.cells: print(cell.text) #也可以用下面方法
            '''text = ''
            for p in cell.paragraphs:
                text += p.text
            print(text)'''

pandas 寫入docx。

import docx
import pandas as pd

# i am not sure how you are getting your data, but you said it is a
# pandas data frame
df = pd.DataFrame(data)

# open an existing document
doc = docx.Document('./test.docx')

# add a table to the end and create a reference variable
# extra row is so we can add the header row
t = doc.add_table(df.shape[0]+1, df.shape[1])

# add the header rows.
for j in range(df.shape[-1]):
    t.cell(0,j).text = df.columns[j]

# add the rest of the data frame
for i in range(df.shape[0]):
    for j in range(df.shape[-1]):
        t.cell(i+1,j).text = str(df.values[i,j])

# save the doc
doc.save('./test.docx')

這樣就可以建立一個模版，然后在相應的地方做一些數據統計和分析，自動發郵件就可以了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 利用Python半自動化生成Nessus報告 python自動生成Docx(docxtpl庫) python 解析docx文檔的方法，以及利用Python從docx文檔提取插入的文本對象和圖片記錄python接口自動化測試--利用unittest生成測試報告(第四目) 利用python-docx批量處理Word文件—圖片利用python-docx批量處理Word文件——表格如何利用python將.doc文件轉換為.docx文件利用docx實現XXE 用Python寫網頁的自動簽到 python自動化報告的輸出