使用 Open XML 操作文檔模板自動生成報表

本文轉載自查看原文 2012-04-18 18:32 8593 open xml/ C#開發

羅朝輝 (http://kesalin.cnblogs.com/)

本文遵循“署名-非商業用途-保持一致”創作公用協議

Open XML SDK 是微軟提供的一個用於編輯於操作 MS Office 文檔的類庫，通過該類庫我們可以用編程方式創建，編輯Office 文檔，當然這對 Office 版本是有要求的，只支持Office 2007+。

Open XML SDK 下載：點此鏈接

開發者博客：http://openxmldeveloper.org/

微軟文檔：http://msdn.microsoft.com/zh-cn/library/bb448854.aspx

本文源代碼下載：點此下載

自 Offce 2007開始，微軟使用了新的架構來實現 Office 套件，那就是基於 xml。如果我們給一個word 2007 或 word 2010文檔添加.zip后綴，並用解壓縮文件打開，可以看到該文檔包含了一堆 xml 文件。如下圖所示：

上圖就是一個 Word 的構成，其中 word 目錄是其關鍵內容部分，word/media 包含該文檔用到的多媒體資源文件，如圖片，聲音等，word/theme 包含對該文檔的主題定義，如字體神馬的，有點象網站的 css 文件，word/document.xml包含了具體的內容，如文字內容，布局，圖片引用等，是我們研究的重點文檔。下面顯示只包含一行“羅朝輝的blog”的文檔的word/document.xml內容：

  <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
- <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
- <w:body>
- <w:p w:rsidR="00111330" w:rsidRDefault="000D4700">
- <w:r>
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>羅朝輝的</w:t> 
  </w:r>
  <w:proofErr w:type="spellStart" /> 
- <w:r>
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>blog</w:t> 
  </w:r>
- <w:r w:rsidR="00984A94">
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>:</w:t> 
  </w:r>
- <w:hyperlink r:id="rId5" w:history="1">
- <w:r w:rsidR="00984A94" w:rsidRPr="00984A94">
- <w:rPr>
  <w:rStyle w:val="a3" /> 
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>http</w:t> 
  </w:r>
  <w:proofErr w:type="spellEnd" /> 
- <w:r w:rsidR="00984A94" w:rsidRPr="00984A94">
- <w:rPr>
  <w:rStyle w:val="a3" /> 
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>://kesalin.cnblogs.com</w:t> 
  </w:r>
  </w:hyperlink>
  <w:bookmarkStart w:id="0" w:name="_GoBack" /> 
  <w:bookmarkEnd w:id="0" /> 
  </w:p>
- <w:sectPr w:rsidR="00111330">
  <w:pgSz w:w="11906" w:h="16838" /> 
  <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="851" w:footer="992" w:gutter="0" /> 
  <w:cols w:space="425" /> 
  <w:docGrid w:type="lines" w:linePitch="312" /> 
  </w:sectPr>
  </w:body>
  </w:document>

上面的 xml 看起來很凌亂，如果我們通過 Open XML SDK 工具來查看的話就一目了然了：

從上面我們就可以清晰看出 word 文檔的結構。一個 word文檔包含一個主 document 元素，該 document 又包含 body 元素，body包含paragraph 元素或 table 元素；而 paragraph 元素包含 run 元素，一個 run 元素包含 text 元素；一個 table 元素包含 tableRow元素，tableRow包含 tableCell元素，tableCell 是個容器可以包含 paragraph 或其他運行時元素 run等。具體層次結構請參考：控制 Open XML WordprocessingML 文檔中文本

有了這些前奏知識，下面步入正題：如何創建文檔模板，通過編程方式修改模板內容，在這里只講怎樣修改文本和圖片。

一，首先，創建文檔模板。

打開 word 2010 or 2007，在文件->選型->自定義功能區，選擇開發工具，讓開發工具在word上面的工具欄上顯示。

然后向文檔中中添加文本和圖片內容控件，如下圖所示：

添加方法：選擇一個內容控件，然后為內容控件添加默認的內容（文字或圖片），選中內容控件，點擊開發工具->屬性，為該內容控件添加標題或標記（tagID），這一步很重要，這個tagID是唯一標識該內容控件的，在代碼中我們就是通過該tagID來定位具體內容控件的。

最終結果：（請參考下載文件中的 Template.docx 文件。）

在上面的圖中可以看出我們添加富文本，純文本以及圖片內容控件。下面我們使用代碼在代碼中將這些 placeholder 控件的內容替換。這是自動生成報表文檔的關鍵技術所在。

如果我們打開 document.xml 文件，查看文本內容控件部分，就可以清晰地看出內容控件的布局：

在上圖可以看到這個文本內容控件包含在一個 sdt （Structured Document Tag）元素中，在前面的介紹中，我們知道文本內容最終會被包含在一個 Run->Text元素中，因替換操作只需要按照內容控件的 tagID 找到該 sdt 元素，將其 Text 元素內容替換即可。圖像替換操作也是同樣的處理，只是有一些額外的事情需要注意。內容控件都是包含在某個 sdt 元素中的，sdt 元素可能是 SdtBlock, SdtCell, SdtRun等，它們都是 SdtElement的子類。

二，使用 Open XML 打開和關閉 Word 文檔。

1，Open XML 中用於操作 Word 的類為 WordprocessingDocument，通過該類提供的接口，我們可以方便地打開和關閉 word 文檔。WordprocessingDocument.Open帶有兩個參數：一個是文檔路徑，一個用於指示是否是可編輯方式打開。

        /// <summary>
        /// Contains the word processing document
        /// </summary>
        private WordprocessingDocument _wordProcessingDocument;

        /// <summary>
        /// Contains the main document part
        /// </summary>
        private MainDocumentPart _mainDocPart;

        /// <summary>
        /// Open an Word XML document 
        /// </summary>
        /// <param name="docname">name of the document to be opened</param>
        public void OpenDocuemnt(string docname)
        {
            // open the word docx
            _wordProcessingDocument = WordprocessingDocument.Open(docname, true);

            // get the Main Document part
            _mainDocPart = _wordProcessingDocument.MainDocumentPart;
        }

        /// <summary>
        /// Close the document
        /// </summary>
        public void CloseDocument()
        {
            _wordProcessingDocument.Close();
        }

打開文檔之后，我們獲取主 document 部分（即word/document.xml那部分）。

2，下面我們來替換文檔中的文本內容控件。讓我們來試驗下TDD流程，首先我們知道具體的內容控件的 tagID和想要替換的文字，這兩個就是我們的輸入：

var textDict = new Dictionary<string, string>
                               {
                                   {"TextPlaceholder_01", "SdtBlock替換文本"},
                                   {"PH_Name", "張三"},
                                   {"PH_Age", "18"},
                                   {"PH_Class", "C82"},
                                   {"PH_Grade", "83.0"},
                                   {"PH_SdtRun", "SdtRun替換"},
                               };

然后我們想要調用一個方法，將模板文檔中所匹配 tagID 的文本內容控件的文字替換掉：

        /// <summary>
        /// Updated text placeholders with texts.
        /// </summary>
        /// <param name="tagValueDict">Pair of placeholder tagID and text to replace.</param>
        public void UpdateText(Dictionary<string, string> tagValueDict)
        {
            foreach (var pair in tagValueDict)
            {
                var tagID = pair.Key;
                var value = pair.Value;

                foreach (var sdtElement in _mainDocPart.Document.Body.Descendants<SdtElement>())
                {
                    if (sdtElement.SdtProperties.GetFirstChild<Tag>().Val == tagID)
                    {
                        OpenXmlElement parantElement = sdtElement.Descendants<Paragraph>().SingleOrDefault();
                        if (null == parantElement)
                        {
                            SdtContentRun cr = sdtElement.Descendants<SdtContentRun>().SingleOrDefault();
                            parantElement = cr;
                        }

                        if (null != parantElement)
                        {
                            Run r = parantElement.Descendants<Run>().SingleOrDefault();
                            if (null != r)
                            {
                                Text t = r.Descendants<Text>().SingleOrDefault();
                                if (null != t)
                                {
                                    r.AppendChild(new Text(value));
                                    r.RemoveChild(t);
                                }
                            }

                            break;
                        }
                    }
                }
            }
        }

上面的代碼遍歷 body 元素中所以的 sdt 元素，如果某個 sdt 的tagID與要查找的 tagID相等，則說明找到了相應的內容控件，然后找到該 sdt 元素下的 Run 元素，將其子元素 Text 用賦予了新內容的 Text 替換掉即可。

3，下面來看看如何實現圖片的替換，還是用TDD流程，首先我們有圖片內容控件的tagID 以及圖片資源。

var imageDict = new Dictionary<string, MemoryStream>
                                {
                                    {"PH_ImageInSdtBlock_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtRun", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtBlock_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                };

然后我們想要調用一個方法，將模板文檔中所匹配 tagID 的圖片內容控件的圖片替換掉，先前我們介紹到圖片資源是放在 media目錄下的，Open XML 會對圖片資源進行管理，分配給資源一個 rid，然后在其他地方使用該 rid 來引用具體的資源。所以我們需要找到圖片內容控件，然后在該控件下找到引用的圖片資源id，根據跟資源id獲取內容控件的相關信息，如圖片大小等，然后將改資源id 對應的資源替換掉。下面來看代碼：

internal static string GetImageRelID<TSdtType>(TSdtType sdt, string imageTag) where TSdtType : SdtElement
        {
            // loop through all tags in the document within the sdt element
            foreach (Tag t in sdt.Descendants<Tag>().ToList())
            {
                // Do we have the correct tag?
                if (t.Val.ToString().ToUpper() == imageTag.ToUpper())
                {
                    // Get the BLIP for the image - there is only one image per placeholder so no need to loop through anything
                    Blip b = sdt.Descendants<Blip>().FirstOrDefault();
                    if (null != b)
                    {
                        // return the image id tag
                        return b.Embed.Value;
                    }
                }
            }

            return string.Empty;
        }

上面的代碼用於在某個 sdt 元素下面查找匹配內容控件ID所使用的圖像資源id。然后我們根據該資源id來獲取placeholder image的大小：

internal static void GetPlaceholderImageSize(IEnumerable<Drawing> drawingList, string relID, out int width, out int height)
        {
            width = -1;
            height = -1;

            // Loop through all Drawing elements in the document
            foreach (Drawing d in drawingList)
            {
                // Loop through all the pictures (Blip) in the document
                if (d.Descendants<Blip>().ToList().Any(b => b.Embed.ToString() == relID))
                {
                    // The document size is in EMU. 1 pixel = 9525 EMU

                    // The size of the image placeholder is located in the EXTENT element
                    Extent e = d.Descendants<Extent>().FirstOrDefault();
                    if (null != e)
                    {
                        width = (int)(e.Cx / 9525);
                        height = (int)(e.Cy / 9525);
                    }

                    if (width == -1)
                    {
                        // The size of the image is located in the EXTENTS element
                        Extents e2 = d.Descendants<Extents>().FirstOrDefault();
                        if (null != e2)
                        {
                            width = (int)(e2.Cx / 9525);
                            height = (int)(e2.Cy / 9525);
                        }
                    }
                }
            }
        }

獲取到大小信息之后，我們就可以使用資源id以及圖像大小信息，替換圖像來替換具體的placeholder圖像了。

        private void UpdateImagePart(string relID, MemoryStream imageStream, int width, int height)
        {
            var originalBitmap = Image.FromStream(imageStream);
            var bitmap = originalBitmap;
　　　　　　　// resize image
            if (width != -1)
            {
                bitmap = new Bitmap(originalBitmap, width, height);
            }

            // Save image data to ImagePart
            var stream = new MemoryStream();
            bitmap.Save(stream, originalBitmap.RawFormat);

            // Get the ImagePart
            var imagePart = (ImagePart)_mainDocPart.GetPartById(relID);

            // Create a writer to the ImagePart
            var writer = new BinaryWriter(imagePart.GetStream());

            // Overwrite the current image in the docx file package
            writer.Write(stream.ToArray());

            // Close the ImagePart
            writer.Close();
        }

最終，我們就得到了更新圖片的接口：

        public void UpdateImage(Dictionary<string, MemoryStream> tagValueDict)
        {
            foreach (var pair in tagValueDict)
            {
                var tagID = pair.Key;
                var imageStream = pair.Value;

                foreach (SdtElement sdtElement in _mainDocPart.Document.Body.Descendants<SdtElement>())
                {
                    string relID = GetImageRelID(sdtElement, tagID);
                    if (!string.IsNullOrEmpty(relID))
                    {
                        // Get size of image
                        int imageWidth;
                        int imageHeight;
                        GetPlaceholderImageSize(_mainDocPart.Document.Body.Descendants<Drawing>(), relID, out imageWidth, out imageHeight);

                        UpdateImagePart(relID, imageStream, imageWidth, imageHeight);

                        break;
                    }
                }
            }
        }

三，測試

寫一個控制台測試程序，將拷貝模板文檔至輸出文檔，將輸出文檔中的內容和圖片替換：

        static void Main()
        {
            const string templateDocx = @"..\..\Template.docx";
            const string outputDocx = @"..\..\Output.docx";

            // copy the word doc so you can see the difference between the two
            File.Delete(outputDocx);
            File.Copy(templateDocx, outputDocx);

            var contentControlManager = new ContentControlManager();
            contentControlManager.OpenDocuemnt(outputDocx);

            var textDict = new Dictionary<string, string>
                               {
                                   {"TextPlaceholder_01", "SdtBlock替換文本"},
                                   {"PH_Name", "張三"},
                                   {"PH_Age", "18"},
                                   {"PH_Class", "C82"},
                                   {"PH_Grade", "83.0"},
                                   {"PH_SdtRun", "SdtRun替換"},
                               };

            contentControlManager.UpdateText(textDict);

            var imageDict = new Dictionary<string, MemoryStream>
                                {
                                    {"PH_ImageInSdtBlock_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtRun", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtBlock_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                };

            contentControlManager.UpdateImage(imageDict);

            contentControlManager.CloseDocument();
        }

打開生成 Output.docx，可以看到內容已經替換掉了：

源碼下載：點此下載

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ApiDoc使用及操作文檔 FastJson學習操作文檔使用Python自動生成報表以郵件發送使用freemarker模板生成word文檔 Java使用iText根據HTML模板生成PDF報表使用swaggo自動生成Restful API文檔使用Jasperreporter生成入庫出庫單打印等報表操作 VBA 5：worksheets 工作表對象案例：自動生成本月日報表模板、月度匯總使用Open xml 操作Excel系列之一-讀取Excel WebAPI使用多個xml文件生成幫助文檔(轉)