關於openOffice對於word的轉換及遇到的問題

本文轉載自查看原文 2018-09-10 14:32 3368 java工具方法/ openOffice

一：需求詳情：

　　公司需要存儲合同文件，用戶上傳word文檔的合同，通過openOffice去把word轉換為pdf、再把pdf轉換為圖片格式，並分別存儲。因為openOffice的轉換需要耗費挺大的內存，所以設計為task任務，凌晨自動轉換。

　　記錄本次需求完成的時候遇到的問題。

二：過程

　　1：本地環境編碼（windows）

　　第一步：因為是本地環境的編碼而且是Windows環境，所以從安裝openOffice開始，到啟動服務並沒有遇到難題。

　　第二步：轉換所需要的工具包；

 1 <dependency>
 2 <groupId>commons-cli</groupId>  3 <artifactId>commons-cli</artifactId>  4 <version>1.2</version>  5 </dependency>  6  7 <dependency>  8 <groupId>commons-io</groupId>  9 <artifactId>commons-io</artifactId> 10 <version>1.4</version> 11 </dependency> 12 13 <dependency> 14 <groupId>org.openoffice</groupId> 15 <artifactId>juh</artifactId> 16 <version>3.0.1</version> 17 </dependency> 18 19 <dependency> 20 <groupId>org.openoffice</groupId> 21 <artifactId>jurt</artifactId> 22 <version>3.0.1</version> 23 </dependency> 24 25 <dependency> 26 <groupId>org.openoffice</groupId> 27 <artifactId>ridl</artifactId> 28 <version>3.0.1</version> 29 </dependency> 30 31 <dependency> 32 <groupId>org.slf4j</groupId> 33 <artifactId>slf4j-api</artifactId> 34 </dependency> 35 36 <dependency> 37 <groupId>org.slf4j</groupId> 38 <artifactId>slf4j-jdk14</artifactId> 39 <scope>test</scope> 40 </dependency> 41 42 <dependency> 43 <groupId>org.openoffice</groupId> 44 <artifactId>unoil</artifactId> 45 <version>3.0.1</version> 46 </dependency> 47 48 <dependency> 49 <groupId>com.thoughtworks.xstream</groupId> 50 <artifactId>xstream</artifactId> 51 <version>1.3.1</version> 52 </dependency> 53 54 <dependency> 55 <groupId>org.apache.pdfbox</groupId> 56 <artifactId>fontbox</artifactId> 57 <version>2.0.8</version> 58 </dependency> 59 60 <dependency> 61 <groupId>org.apache.pdfbox</groupId> 62 <artifactId>pdfbox</artifactId> 63 <version>2.0.8</version> 64 </dependency>

　　問題1：在這里遇到了第一個問題，就是在maven的中央倉庫找不到關鍵的依賴jar包的問題。

　　jodconverter-cli 這個jar包中央倉庫找不到jar包依賴，jodconverter 版本才到2.2.1（這個版本之前的不能支持docx格式轉換，2.2.2及以后才開始支持。）

　　然后和大牛商量，加入到公司內網自己的maven倉庫。

　　第三步：工具類

 1 /**  2  * @author GH  3  * 輸入文件  4  * 輸出文件  5 */  6 public class WordToPdf {//word轉pdf  7 public static void docToPdf(File inputFile, File outputFile){  8 OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);  9 try{ 10  connection.connect(); 11 DocumentConverter converter = new OpenOfficeDocumentConverter(connection); 12  converter.convert(inputFile, outputFile); 13 }catch(ConnectException cex){ 14  cex.printStackTrace(); 15 }finally{ 16 if(connection!=null){ 17  connection.disconnect(); 18 connection = null; 19  } 20  } 21  } 22 }

 1 /**  2  * @author GH  3  * 參數1：要裝換的pdf位置  4  * 參數2：轉換后的圖片存放位置  5  * 參數3：中間要拼接的名字  6  * return：轉換后的img名字集合  7 */  8 public class PdfToImage {//pdf轉img  9 public static List<String> pdfToImagePath(String srcFile,String contractFromSrc,String name){ 10 List<String> list = new ArrayList<>(); 11  String imagePath; 12 File file = new File(srcFile); 13 try { 14 File f = new File(contractFromSrc); 15 if(!f.exists()){ 16  f.mkdir(); 17  } 18 PDDocument doc = PDDocument.load(file); 19 PDFRenderer renderer = new PDFRenderer(doc); 20 int pageCount = doc.getNumberOfPages(); 21 for(int i=0; i<pageCount; i++){ 22 // 方式1,第二個參數是設置縮放比(即像素) 23 // BufferedImage image = renderer.renderImageWithDPI(i, 296); 24 // 方式2,第二個參數是設置縮放比(即像素) 25 BufferedImage image = renderer.renderImage(i, 2f); //第二個參數越大生成圖片分辨率越高，轉換時間也就越長 26 imagePath = contractFromSrc+name+"-"+i +".jpg"; 27 ImageIO.write(image, "PNG", new File(imagePath)); 28 list.add(name+"-"+i +".jpg"); 29  } 30  doc.close(); 31 } catch (IOException e) { 32  e.printStackTrace(); 33  } 34 return list; 35  } 36 }

　　第四步：編碼

　　首先從數據庫讀取沒有轉換過的集合，循環下載oss對象存儲文件到指定臨時文件夾。

　　通過工具類轉換下載的word為pdf，錄入數據pdf記錄，上傳oss對象pdf圖片。

　　通過工具類轉換得到的pdf圖片，錄入數據路圖片記錄，上傳轉換得到的img圖片。

　　try catch捕捉異常，有異常就回滾數據庫，刪除oss對象上傳的文件。

　　修改word的轉換狀態為已轉換。

　　問題2：因為到最后測試環境和生產環境都是Linux系統的，因為涉及到文件的操作，但是Linux和Windows的文件路徑是不一樣的，例如：Windows文件路徑為（C:\tmp\test.txt）Linux則為（/tmp/test.txt）

　　因此采用這種方式

1 　　public final static String Convert_Tmp_Url="C:"+File.separator+"temp"+File.separator+"contractToImg"+File.separator;//進行word——img轉換的時候的暫時存放路徑 window 2 public final static String Convert_Tmp_Url2=File.separator+"tmp"+File.separator+"contractToImg"+File.separator;//進行word——img轉換的時候的暫時存放路徑 linux

　　File.separator 與系統有關的默認名稱分隔符，為了方便，它被表示為一個字符串在Linux此字段的值為 '/' Windows為'\'

　　第五步：本地測試，沒有問題。

　　2：測試環境測試（Linux）

　　問題3：在Linux環境下word轉換word中文出現亂碼空白，導致的原因是Linux缺少中文字體編碼。

　　解決方法：

　　步驟1：創建路徑。

　　在centos的/usr/java/jdk1.8.0_91/jre/lib/fonts下新建路徑：fallback。

　　步驟2：上傳字體。

　　將字體：simhei.ttf 黑體、simsun.ttc 宋體（windows下通過everything找下）上傳至/usr/java/jdk1.8.0_91/jre/lib/fonts/fallback路徑下。

　　步驟3：查看系統字體文件路徑。

　　查看方案:

[root@80ec6 fallback]# cat /etc/fonts/fonts.conf
<dir>/usr/share/fonts</dir>
<dir>/usr/share/X11/fonts/Type1</dir> <dir>/usr/share/X11/fonts/TTF</dir> <dir>/usr/local/share/fonts</dir>
<dir>~/.fonts</dir>

　　步驟4：字體拷貝。

　　將 /usr/java/jdk1.8.0_91/jre/lib/fonts的全部內容，拷貝到步驟3查看的路徑下，我的字體路徑為：/usr/share/fonts。

　　步驟5：更新緩存

　　執行命令：fc-cache

　　步驟6：kill掉openoffice進程。

　　[root@80ec6 fonts]# ps -ef | grep openoffice

　　root 3045 3031 0 06:19 pts/1 00:00:03 /opt/openoffice4/program/soffice.bin -headless -accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard

　　執行kill：kill -9 3045

　　步驟7：重啟后台運行openoffice。

　[root@a3cf78780ec6 openoffice4]# soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &

　　3：測試環境和生產環境內核不一樣，安裝的安裝包不一樣。

　　測試環境的安裝的是deb文件，使用 dpkg命令安裝所有的deb文件，啟動服務就能使用。

　　生產環境的是dpkg命令找不到。改換安裝prm文件，執行安裝之后，竟然啟動不了，查找原因之后盡然是沒有安裝完，RPMS目錄下有desktop-integration文件夾，進入到desktop-integration目錄，里面有四個rpm　　文件，選擇相應的安裝即可，這里我選擇的是redhat版本。
　　執行 rpm -ivh　openoffice4.1.5-redhat-menus-4.1.5-9789.noarch.rpm

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 openoffice從word轉pdf問題 openoffice excel word 轉換pdf 支持本地調用和遠程調用 OpenOffice安裝和轉換亂碼解決方案 openoffice將word轉pdf中文亂碼或消失的坑 json轉換遇到的問題(JSONObject.toJSONString的問題) 使用WPS的API轉換word、excel、ppt為PDF問題 word版本問題導致.com接口轉換出錯 “Word在試圖打開文件時遇到錯誤。請嘗試下列方法：* 檢查文檔或驅動器的文件權限。* 確保有足夠的內存和磁盤空間。* 用文件恢復轉換器打開文件。”問題！ Spire.Doc——生成word 遇到某些符號自動換行問題解決使用插件jquery.wondexport.js將頁面導入word文檔遇到的問題