概述

POI 的起源

POI是apache的一個開源項目，他的起始初衷是處理基於Office Open XML標准（OOXML）和Microsoft OLE 2復合文檔格式（OLE2）的各種文件格式的文檔，而且對於讀和寫都有支持。可以說是JAVA處理OFFICE文檔的首選工具了。

HWPF和XWPF

POI操作word文檔的兩個主要模塊就是HWPF和XWPF。
HWPF是操作Microsoft Word 97（-2007）文件的標准API入口。它還支持對舊版Word 6和Word 95文件對有限的只讀功能。
XWPF是操作Microsoft Word 2007文件的標准API入口。

讀取word文檔

其實，POI對於word文檔的讀寫操作提供了許多API可用，這里只提供最簡單的按段落讀取文字內容的demo，對於圖片讀取或表格的讀取，以后再更新。

maven依賴

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>24.0-jre</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.17</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.17</version>
        </dependency>

public static <T> List<String> readWordFile(String path) {
        List<String> contextList = Lists.newArrayList();
        InputStream stream = null;
        try {
            stream = new FileInputStream(new File(path));
            if (path.endsWith(".doc")) {
                HWPFDocument document = new HWPFDocument(stream);
                WordExtractor extractor = new WordExtractor(document);
                String[] contextArray = extractor.getParagraphText();
                Arrays.asList(contextArray).forEach(context -> contextList.add(CharMatcher.whitespace().removeFrom(context)));
                extractor.close();
                document.close();
            } else if (path.endsWith(".docx")) {
                XWPFDocument document = new XWPFDocument(stream).getXWPFDocument();
                List<XWPFParagraph> paragraphList = document.getParagraphs();
                paragraphList.forEach(paragraph -> contextList.add(CharMatcher.whitespace().removeFrom(paragraph.getParagraphText())));
                document.close();
            } else {
                LOGGER.debug("此文件{}不是word文件", path);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (null != stream) try {
                stream.close();
            } catch (IOException e) {
                e.printStackTrace();
                LOGGER.debug("讀取word文件失敗");
            }
        }
        return contextList;
    }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 poi讀取word文檔利用poi操作word文檔 Java POI Word 寫文檔 POI批量生成Word文檔表格 POI實現word文檔轉html文件 poi導出word文檔，doc和docx java poi生成word文檔並下載 Java使用POI讀取Word中的表格讀取Word文檔中的表格 Python讀取word文檔內容