POI按行讀取word,並去掉屬性標簽內容:超鏈接


public String readDoc(File file) {
        StringBuffer buffer = new StringBuffer();
        InputStream input = null;
        WordExtractor extractor = null;
        String[] paragraphs = null;
        try {
            input = new FileInputStream(file);
            extractor = new WordExtractor(input);
            paragraphs = extractor.getParagraphText();
            for (String paragraph : paragraphs) {
                buffer.append(extractor.stripFields(paragraph)).append("\\\r\\\n");
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (input != null) {
                try {
                    input.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return buffer.toString();
    }
    

剔除方法:extractor.stripFields(paragraph);

提取文檔內容文章。excel,pdf,word.....

http://blog.sina.com.cn/s/blog_67b9ad8d01010bwa.html

出現問題文章:

http://bbs.csdn.net/topics/320055955


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM