POI按行读取word,并去掉属性标签内容:超链接


public String readDoc(File file) {
        StringBuffer buffer = new StringBuffer();
        InputStream input = null;
        WordExtractor extractor = null;
        String[] paragraphs = null;
        try {
            input = new FileInputStream(file);
            extractor = new WordExtractor(input);
            paragraphs = extractor.getParagraphText();
            for (String paragraph : paragraphs) {
                buffer.append(extractor.stripFields(paragraph)).append("\\\r\\\n");
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (input != null) {
                try {
                    input.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return buffer.toString();
    }
    

剔除方法:extractor.stripFields(paragraph);

提取文档内容文章。excel,pdf,word.....

http://blog.sina.com.cn/s/blog_67b9ad8d01010bwa.html

出现问题文章:

http://bbs.csdn.net/topics/320055955


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM