POI執行解析word轉化HTML


目前來說解析word文檔顯示在html上有三種辦法

分別是:POI(比較麻煩)

    插件(要付費,或者每天只允許調用500次,不適合大企業)

   把word轉化成為PDF然后通過flash體現在頁面上(不怎么樣,麻煩+可操作性不強)

     使用H5執行,不太熟悉H5

 

既然選擇了POI那么就開始做了。

第一步先maven導入jar包.

<dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-scratchpad</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-ooxml</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>fr.opensagres.xdocreport</groupId> 
     <artifactId>xdocreport</artifactId> 
     <version>1.0.6</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-ooxml-schemas</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>ooxml-schemas</artifactId> 
     <version>1.3</version> 
    </dependency> 

 

POI在解析的時候會有版本問題導致無法調用某些對象。所以word2003跟word2007需要使用不同的方法進行轉化

先解析2007

 @Test
    public void word2007ToHtml() throws Exception {
        String filepath = "e:/files/";
        String sourceFileName =filepath+"前言.docx"; 
        String targetFileName = filepath+"1496717486420.html"; 
        String imagePathStr = filepath+"/image/";  
        OutputStreamWriter outputStreamWriter = null; 
        try { 
          XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName)); 
          XHTMLOptions options = XHTMLOptions.create(); 
          // 存放圖片的文件夾 
          options.setExtractor(new FileImageExtractor(new File(imagePathStr))); 
          // html中圖片的路徑 
          options.URIResolver(new BasicURIResolver("image")); 
          outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8"); 
          XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance(); 
          xhtmlConverter.convert(document, outputStreamWriter, options); 
        } finally { 
          if (outputStreamWriter != null) { 
            outputStreamWriter.close(); 
          } 
        }
      } 

然后沒試過的2003

    @Test
    public void test(){
        DocxToHtml("E://files//1496635038432.doc","E://files//1496635038432.html");
    }
    public static void DocxToHtml(String fileAllName,String outPutFile){
        HWPFDocument wordDocument;
        try {
            //根據輸入文件路徑與名稱讀取文件流
            InputStream in=new FileInputStream(fileAllName);
            //把文件流轉化為輸入wordDom對象
            wordDocument = new HWPFDocument(in);
            //通過反射構建dom創建者工廠
            DocumentBuilderFactory domBuilderFactory=DocumentBuilderFactory.newInstance();
            //生成dom創建者
            DocumentBuilder domBuilder=domBuilderFactory.newDocumentBuilder();
            //生成dom對象
            Document dom=domBuilder.newDocument();
            //生成針對Dom對象的轉化器
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(dom);    
            //轉化器重寫內部方法
             wordToHtmlConverter.setPicturesManager( new PicturesManager()    
             {    
                 public String savePicture( byte[] content,    
                         PictureType pictureType, String suggestedName,    
                         float widthInches, float heightInches )    
                 {    
                     return suggestedName;    
                 }    
             } ); 
            //轉化器開始轉化接收到的dom對象
            wordToHtmlConverter.processDocument(wordDocument); 
            //保存文檔中的圖片
        /*    List<?> pics=wordDocument.getPicturesTable().getAllPictures();    
            if(pics!=null){    
                for(int i=0;i<pics.size();i++){    
                    Picture pic = (Picture)pics.get(i);   
                    try {    
                        pic.writeImageContent(new FileOutputStream("E:/test/"+ pic.suggestFullFileName()));    
                    } catch (FileNotFoundException e) {    
                        e.printStackTrace();    
                    }      
                }    
            } */
            //從加載了輸入文件中的轉換器中提取DOM節點
            Document htmlDocument = wordToHtmlConverter.getDocument();  
            //從提取的DOM節點中獲得內容
            DOMSource domSource = new DOMSource(htmlDocument);
            
            //字節碼輸出流
            ByteArrayOutputStream out = new ByteArrayOutputStream(); 
            //輸出流的源頭
            StreamResult streamResult = new StreamResult(out);    
            //轉化工廠生成序列轉化器
            TransformerFactory tf = TransformerFactory.newInstance();    
            Transformer serializer = tf.newTransformer();
            //設置序列化內容格式
            serializer.setOutputProperty(OutputKeys.ENCODING, "GB2312");    
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");    
            serializer.setOutputProperty(OutputKeys.METHOD, "html");
            
            serializer.transform(domSource, streamResult);    
            //生成文件方法
            writeFile(new String(out.toByteArray()), outPutFile);
            out.close(); 
        } catch (FileNotFoundException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (TransformerConfigurationException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                } catch (TransformerException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ParserConfigurationException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
        }
    }
    
    
     public static void writeFile(String content, String path) {    
            FileOutputStream fos = null;    
            BufferedWriter bw = null;    
            try {    
                File file = new File(path);    
                fos = new FileOutputStream(file);    
                bw = new BufferedWriter(new OutputStreamWriter(fos,"GB2312"));    
                bw.write(content);    
            } catch (FileNotFoundException fnfe) {    
                fnfe.printStackTrace();    
            } catch (IOException ioe) {    
                ioe.printStackTrace();    
            } finally {    
                try {    
                    if (bw != null)    
                        bw.close();    
                    if (fos != null)    
                        fos.close();    
                } catch (IOException ie) {    
                }    
            }    
        }    

 

這兩個方法可以將word轉化成HTML,注意如果是在IE8的情況下會無法顯示表格邊框。

我會進一步優化這個方法


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM