將word表格中數據導出到Excel


  最近項目有一個需求,將多個word文件中的表格內容導入到Excel中,以方便下一步的處理,表格的格式是相同的。在網上找了很多資料,終於使用OpenXML SDK實現了,在此也把源代碼分享給大家。

  主要參考文章 http://blog.darkthread.net/blogs/darkthreadtw/archive/2010/06/01/6454.aspx

  關鍵代碼:

  一、將DOC格式文件轉為DOCX:

  因為OpenXML SDK只支持DOCX格式文件,因此首先要把DOC格式文件轉為DOCX。

        /// <summary>
        /// 格式轉換 DOC -> DOCX
        /// </summary>
        /// <param name="pathSource"></param>
        /// <param name="pathTarget"></param>
        public static void DocToDocx(string pathSource, string pathTarget)
        {
            object missing = System.Reflection.Missing.Value;
            Word.Application wordApp = new Word.Application();
            wordApp.Visible = false;
            Word.Document doc = null;
 
            object path1 = pathSource;
 
            doc = wordApp.Documents.Open(ref path1,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing);
 
            object path2 = pathTarget;
            object fileType = Word.WdSaveFormat.wdFormatDocumentDefault;
            object compatibilityMode = Word.WdCompatibilityMode.wdWord2010;
 
            if (doc.SaveFormat == (int)Word.WdSaveFormat.wdFormatDocument)
            {
                doc.SaveAs2(ref path2, ref fileType,
                    ref missing, ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing, ref compatibilityMode);
            }
 
            if (doc != null) doc.Close(ref missing, ref missing, ref missing);
            wordApp.Quit(ref missing, ref missing, ref missing);
        }
 

  二、從DOCX文件中提取表格、行、單元格及內容

    public static class DocxTableExt
    {
        public static Table[] GetTables(this Body body)
        {
            return body.Elements<Table>().ToArray();
        }
 
        public static TableRow[] GetTableRows(this Table tbl)
        {
            return tbl.Elements<TableRow>().ToArray();
        }
 
        public static TableCell[] GetTableCells(this TableRow tr)
        {
            return tr.Elements<TableCell>().ToArray();
        }
 
        public static string GetTableCellContent(this TableCell td)
        {
            return string.Join("\n", td.Elements<Paragraph>().Select(o => o.InnerText).ToArray());
        }
    }

 

  三、根據配置文件提取對應單元格數據放到DataRow中

        /// <summary>
        /// 從word表格中提取對應數據到數據行中
        /// </summary>
        /// <param name="dt"></param>
        /// <param name="pathSource"></param>
        /// <param name="xmlConfig"></param>
        /// <returns></returns>
        public DataRow CreatRow(DataTable dt, string pathSource, XmlConfig xmlConfig)
        {
            DataRow dr = dt.NewRow();
 
            using (WordprocessingDocument doc = WordprocessingDocument.Open(pathSource, false))
            {
                var tables = doc.MainDocumentPart.Document.Body.GetTables();
                for (int tableIndex = 0; tableIndex < tables.Length; tableIndex++)
                {
                    Table table = doc.MainDocumentPart.Document.Body.GetTables()[tableIndex];
                    var rows = table.GetTableRows();
                    for (int rowIndex = 0; rowIndex < rows.Length; rowIndex++)
                    {
                        var cells = rows[rowIndex].GetTableCells();
                        for (int columnIndex = 0; columnIndex < cells.Length; columnIndex++)
                        {
                            foreach (CellClass cell in xmlConfig.Import)
                            {
                                if ((tableIndex == cell.TableIndex - 1) && (rowIndex == cell.RowIndex - 1) && (columnIndex == cell.ColumnIndex - 1))
                                {
                                    dr[cell.Title] = cells[columnIndex].GetTableCellContent();
                                }
                            }
                        }
                    }
                }
            }
 
            return dr;
        }
    }

 

  四、配置文件示例

<?xml version="1.0" encoding="utf-8"?>
<XmlConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Import>
        <Item Title="姓名" Table="1" Row="1" Column="2" />
        <Item Title="性別" Table="1" Row="2" Column="2" />
        <Item Title="單位" Table="1" Row="8" Column="4" />
        <Item Title="工作簡歷" Table="1" Row="9" Column="2" />
    </Import>
    <Export RowStart="1" ColumnStart="1" />
</XmlConfig>

  說明:

  <Item Title="姓名" Table="1" Row="1" Column="2" /> 表示將word中第1個表格的第1行第2列處的數據提取到Excel中姓名列

  <Export RowStart="1" ColumnStart="1" /> 表示導出的Excel數據從第1行第1列開始

 

  五、下載

  程序  源代碼


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM