java實現word轉pdf在線預覽（前端使用PDF.js；后端使用openoffice、aspose）

本文轉載自查看原文 2018-12-21 21:14 7226

背景

　　之前一直是用戶點擊下載word文件到本地，然后使用office或者wps打開。需求優化，要實現可以直接在線預覽，無需下載到本地然后再打開。

　　隨后開始上網找資料，網上資料一大堆，方案也各有不同，大概有這么幾種方案：

　　1.word轉html然后轉pdf

　　2.Openoffice + swftools + Flexmapper + jodconverter

　　3.kkFileView

　　分析之后最后決定使用Openoffice+PDF.js方式實現

環境搭建

　　1.安裝Openoffice，下載地址：http://www.openoffice.org/download/index.html

　　安裝完成之后，cmd進入安裝目錄執行命令：soffice "-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo -headless -nofirststartwizard

　　2.PDF.js，下載地址：http://mozilla.github.io/pdf.js/

　　下載之后解壓，目錄結構如下：

代碼實現

　　編碼方面，分前端后：

　　后端：java后端使用openoffice把word文檔轉換成pdf文件，返回流

　　前端：把PDF.js解壓后的文件加到項目中，修改對應路徑，PDF.js拿到后端返回的流直接展示

后端

　　項目使用springboot，pom文件添加依賴

<!-- openoffice word轉pdf -->
        <dependency>
            <groupId>com.artofsolving</groupId>
            <artifactId>jodconverter</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>jurt</artifactId>
            <version>3.0.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>ridl</artifactId>
            <version>3.0.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>juh</artifactId>
            <version>3.0.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>unoil</artifactId>
            <version>3.0.1</version>
        </dependency>

　　application.properties配置openoffice服務地址與端口

openoffice.host=127.0.0.1
openoffice.port=8100

　　doc文件轉pdf文件

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.ConnectException;

import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;

import com.xxx.utils.Doc2PdfUtil;

@Controller
@RequestMapping("/doc2PdfController")
public class Doc2PdfController {
    @Value("${openoffice.host}")
    private String OpenOfficeHost;
    @Value("${openoffice.port}")
    private Integer OpenOfficePort;
    
    private Logger logger = LoggerFactory.getLogger(Doc2PdfController.class);
    
    @RequestMapping("/doc2pdf")
    public void doc2pdf(String fileName,HttpServletResponse response){
        File pdfFile = null;
        OutputStream outputStream = null;
        BufferedInputStream bufferedInputStream = null;
        
        Doc2PdfUtil doc2PdfUtil = new Doc2PdfUtil(OpenOfficeHost, OpenOfficePort);
        
        try {
            //doc轉pdf，返回pdf文件
            pdfFile = doc2PdfUtil.doc2Pdf(fileName);
            outputStream = response.getOutputStream();
            response.setContentType("application/pdf;charset=UTF-8");  
            bufferedInputStream = new BufferedInputStream(new FileInputStream(pdfFile));  
            byte buffBytes[] = new byte[1024];  
            outputStream = response.getOutputStream();  
            int read = 0;    
            while ((read = bufferedInputStream.read(buffBytes)) != -1) {    
                outputStream.write(buffBytes, 0, read);    
            }
        } catch (ConnectException e) {
            logger.info("****調用Doc2PdfUtil doc轉pdf失敗****");
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }  finally {
            if(outputStream != null){
                try {
                    outputStream.flush();
                    outputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }    
            }
            if(bufferedInputStream != null){
                try {
                    bufferedInputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

import java.io.File;
import java.net.ConnectException;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.artofsolving.jodconverter.DocumentConverter;
import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.converter.StreamOpenOfficeDocumentConverter;

public class Doc2PdfUtil {
    private String OpenOfficeHost; //openOffice服務地址
    private Integer OpenOfficePort; //openOffice服務端口
    
    public Doc2PdfUtil(){
    }

    public Doc2PdfUtil(String OpenOfficeHost, Integer OpenOfficePort){
        this.OpenOfficeHost = OpenOfficeHost;
        this.OpenOfficePort = OpenOfficePort;
    }
    
    private Logger logger = LoggerFactory.getLogger(Doc2PdfUtil.class);
    
    /**
     * doc轉pdf
     * @return pdf文件路徑
     * @throws ConnectException
     */
    public File doc2Pdf(String fileName) throws ConnectException{
        File docFile = new File(fileName + ".doc");
        File pdfFile = new File(fileName + ".pdf");
        if (docFile.exists()) {
            if (!pdfFile.exists()) {
                OpenOfficeConnection connection = new SocketOpenOfficeConnection(OpenOfficeHost, OpenOfficePort);
                try {
                    connection.connect();
                    DocumentConverter converter = new StreamOpenOfficeDocumentConverter(connection);
                    //最核心的操作，doc轉pdf
                    converter.convert(docFile, pdfFile);
                    connection.disconnect();
                    logger.info("****pdf轉換成功，PDF輸出：" + pdfFile.getPath() + "****");
                } catch (java.net.ConnectException e) {
                    logger.info("****pdf轉換異常，openoffice服務未啟動！****");
                    e.printStackTrace();
                    throw e;
                } catch (com.artofsolving.jodconverter.openoffice.connection.OpenOfficeException e) {
                    System.out.println("****pdf轉換器異常，讀取轉換文件失敗****");
                    e.printStackTrace();
                    throw e;
                } catch (Exception e) {
                    e.printStackTrace();
                    throw e;
                }
            }
        } else {
            logger.info("****pdf轉換異常，需要轉換的doc文檔不存在，無法轉換****");
        }
        return pdfFile;
    }
}

前端

　　把pdfjs-2.0.943-dist下的兩個文件夾build、web整體加到項目中，然后把viewer.html改成viewer.jsp，並調整了位置，去掉了默認的pdf文件compressed.tracemonkey-pldi-09.pdf，將來使用我們生成的文件

　　viewer.jsp、viewer.js注意點：

　　1.引用的js、css路徑要修改過來

　　2.viewer.jsp中調用pdf/web/viewer.js，viewer.js中配置了默認的pdf文件路徑，我們要動態生成pdf，因此需要修改，在jsp中定義一個參數DEFAULT_URL，然后在js中使用它

　　3.jsp中寫了一個ajax獲取pdf流，之后賦值給DEFAULT_URL，然后再讓viewer.js去加載，因此需要把/pdf/web/viewer.js放到ajax方法后面

　　4.viewer.js中把compressed.tracemonkey-pldi-09.pdf改成我們定義的變量DEFAULT_URL；pdf.worker.js的路徑修改成對應路徑

<%@ page language="java" contentType="text/html; charset=utf-8"
    pageEncoding="utf-8"%>
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<!DOCTYPE html>
<!--
Copyright 2012 Mozilla Foundation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Adobe CMap resources are covered by their own copyright but the same license:

    Copyright 1990-2015 Adobe Systems Incorporated.

See https://github.com/adobe-type-tools/cmap-resources
-->
<html dir="ltr" mozdisallowselectionprint>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
    <meta name="google" content="notranslate">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <c:set var="qtpath" value="${pageContext.request.contextPath}"/>
    <script>
        var qtpath = '${qtpath}';
        var fileName = '${fileName}';
    </script>
    
    <title>PDF.js viewer</title>


    <link rel="stylesheet" href="${qtpath}/res/pdf/web/viewer.css">


<!-- This snippet is used in production (included from viewer.html) -->
<link rel="resource" type="application/l10n" href="${qtpath}/res/pdf/web/locale/locale.properties">
<script type="text/javascript" src="${qtpath}/res/js/jquery/jquery-2.1.4.min.js"></script>
<script type="text/javascript">
    var DEFAULT_URL = "";//注意，刪除的變量在這里重新定義  
    var PDFData = "";  
    $.ajax({  
        type:"post",  
        async:false,  //
        mimeType: 'text/plain; charset=x-user-defined',  
        url:'${qtpath}/doc2PdfController/doc2pdf',
        data:{'fileName':fileName},
        success:function(data){  
           PDFData = data;  
        }  
    });  
    var rawLength = PDFData.length;  
    //轉換成pdf.js能直接解析的Uint8Array類型,見pdf.js-4068  
    var array = new Uint8Array(new ArrayBuffer(rawLength));    
    for(i = 0; i < rawLength; i++) {  
      array[i] = PDFData.charCodeAt(i) & 0xff;  
    }  
    DEFAULT_URL = array;
</script>
<script type="text/javascript" src="${qtpath}/res/pdf/build/pdf.js"></script>
<script type="text/javascript" src="${qtpath}/res/pdf/web/viewer.js"></script>

  </head>

  ...

效果

分割線

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

　　本以為完美的實現了doc在線預覽，上測試環境后發現了一個大坑，我們的doc文件不是在本地office創建后上傳的，是其他同事用freemarker ftl模板生成的，這種生成的doc文件根本不是微軟標准的doc，本質是xml數據結構，openoffice拿這種文件去轉換pdf文件直接就報錯了

　　上網查資料查了半天也沒找到這種問題的解決方案，想想只能是放棄openoffice改用其他方法了（freemarker ftl生成doc這個肯定是不能動的）

　　看到一些博客使用word--html--pdf生成pdf，還有的使用freemarker ftl xml 生成pdf感覺還是太繁瑣了，我只是想拿現有的doc（雖然是freemarker ftl生成的）轉換成pdf啊

　　繼續看博客查資料，看到一種方法，使用aspose把doc轉換成pdf，抱着試一試的心態在本地測試了下，沒想到竟然成了，感覺太意外了，aspose方法超級簡單，只要導入jar包，幾行代碼就可以搞定，並且轉換速度比openoffice要快很多。很是奇怪，這么好用這么簡單的工具為什么沒在我一開始搜索word轉pdf的時候就出現呢

aspose doc轉pdf

　　在maven倉庫搜索aspose，然后把依賴加入pom.xml發現jar包下載不下來，沒辦法，最后在csdn下載aspose jar包，然后mvn deploy到倉庫

　　pom.xml

<!-- word轉pdf maven倉庫沒有需要本地jar包發布到私服 -->
        <dependency>
            <groupId>com.aspose.words</groupId>
            <artifactId>aspose-words-jdk16</artifactId>
            <version>14.9.0</version>
        </dependency>

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.ConnectException;

import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;

import com.xxx.utils.Doc2PdfUtil;

@Controller
@RequestMapping("/doc2PdfController")
public class Doc2PdfController {
    
    private Logger logger = LoggerFactory.getLogger(Doc2PdfController.class);
    
    @RequestMapping("/doc2pdf")
    public void doc2pdf(String fileName,HttpServletResponse response){
        File pdfFile = null;
        OutputStream outputStream = null;
        BufferedInputStream bufferedInputStream = null;
        String docPath = fileName + ".doc";
        String pdfPath = fileName + ".pdf";
        try {
            pdfFile = Doc2PdfUtil.doc2Pdf(docPath, pdfPath);
            outputStream = response.getOutputStream();
            response.setContentType("application/pdf;charset=UTF-8");  
            bufferedInputStream = new BufferedInputStream(new FileInputStream(pdfFile));  
            byte buffBytes[] = new byte[1024];  
            outputStream = response.getOutputStream();  
            int read = 0;    
            while ((read = bufferedInputStream.read(buffBytes)) != -1) {    
                outputStream.write(buffBytes, 0, read);    
            }
        } catch (ConnectException e) {
            logger.info("****調用Doc2PdfUtil doc轉pdf失敗****");
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }  finally {
            if(outputStream != null){
                try {
                    outputStream.flush();
                    outputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }    
            }
            if(bufferedInputStream != null){
                try {
                    bufferedInputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

　　Doc2PdfUtil.java

import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.aspose.words.License;
import com.aspose.words.SaveFormat;

public class Doc2PdfUtil {
    
    private static Logger logger = LoggerFactory.getLogger(Doc2PdfUtil.class);
    
    /**
     * doc轉pdf
     * @param docPath doc文件路徑，包含.doc
     * @param pdfPath pdf文件路徑，包含.pdf
     * @return
     */
    public static File doc2Pdf(String docPath, String pdfPath){
        File pdfFile = new File(pdfPath);
        try {
            String s = "<License><Data><Products><Product>Aspose.Total for Java</Product><Product>Aspose.Words for Java</Product></Products><EditionType>Enterprise</EditionType><SubscriptionExpiry>20991231</SubscriptionExpiry><LicenseExpiry>20991231</LicenseExpiry><SerialNumber>8bfe198c-7f0c-4ef8-8ff0-acc3237bf0d7</SerialNumber></Data><Signature>sNLLKGMUdF0r8O1kKilWAGdgfs2BvJb/2Xp8p5iuDVfZXmhppo+d0Ran1P9TKdjV4ABwAgKXxJ3jcQTqE/2IRfqwnPf8itN8aFZlV3TJPYeD3yWE7IT55Gz6EijUpC7aKeoohTb4w2fpox58wWoF3SNp6sK6jDfiAUGEHYJ9pjU=</Signature></License>";
            ByteArrayInputStream is = new ByteArrayInputStream(s.getBytes());
            License license = new License();
            license.setLicense(is);
            com.aspose.words.Document document = new com.aspose.words.Document(docPath);
            document.save(new FileOutputStream(pdfFile),SaveFormat.PDF);
        } catch (Exception e) {
            logger.info("****aspose doc轉pdf異常");
            e.printStackTrace();
        }
        return pdfFile;
    }
}

　　aspose-words-jdk16-14.9.0.jar下載地址

　　https://download.csdn.net/download/u013279345/10868189

window下正常，linux下亂碼的解決方案

　　使用com.aspose.words將word模板轉為PDF文件時，在開發平台window下轉換沒有問題，中文也不會出現亂碼。但是將服務部署在正式服務器（Linux）上，轉換出來的PDF中文就出現了亂碼。在網上找了很久，才找到原因，現將解決辦法分享給大家。

一、問題原因分析

在window下沒有問題但是在linux下有問題，就說明不是代碼或者輸入輸出流編碼的問題，根本原因是兩個平台環境的問題。出現亂碼說明linux環境中沒有相應的字體以供使用，所以就會導致亂碼的出現。將轉換無問題的windos主機中的字體拷貝到linux平台下進行安裝，重啟服務器后轉換就不會出現亂碼了。

二、window字體復制到linux環境並安裝

按照教程安裝完成后重啟linux服務器即可搞定亂碼問題。

1. From Windows

Windows下字體庫的位置為C:\Windows\fonts，這里面包含所有windows下可用的字體。

2. To Linux　　

linux的字體庫是 /usr/share/Fonts 。

在該目錄下新建一個目錄，比如目錄名叫 windows（根據個人的喜好，自己理解就行，當然這里是有權限要求的，你可以用sudo來執行）。

然后將 windows 字體庫中你要的字體文件復制到新建的目錄下(只需要復制*.ttc，和*.ttf的文件).

復制所有字體：
   sudo cp *.ttc /usr/share/fonts/windows/
   sudo cp *.ttf /usr/share/fonts/windows/

更改這些字體庫的權限：
    sudo chmod 755 /usr/share/fonts/windows/*

然后進入Linux字體庫：
cd /usr/share/fonts/windows/

接着根據當前目錄下的字體建立scale文件
    sudo mkfontscale

接着建立dir文件
   sudo mkfontdir

然后運行
   sudo fc-cache

重啟 Linux 操作系統就可以使用這些字體了。

linux下亂碼問題解決方案轉載自:

https://blog.csdn.net/hanchuang213/article/details/64905214

https://blog.csdn.net/shanelooli/article/details/7212812

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 文檔在線預覽開源實現方案二：OpenOffice + pdf.js 前端使用pdf.js預覽pdf文件,超級簡單使用PDF.JS實現pdf文件在線預覽時，報文件被損壞的錯誤移動端pdf在線預覽，使用pdf.js插件實現 PDF預覽插件pdf.js的使用 pc或者微信上用pdf.js在線預覽pdf和word 前台使用pdf.js插件實現預覽pdf Java平台下利用aspose轉word為PDF實現文檔在線預覽 App使用pdf.js實現pdf預覽 JAVA使用aspose實現word文檔轉pdf文件