文件在線預覽doc，docx轉換pdf（一）

本文轉載自查看原文 2018-09-08 11:54 2615

文件在線預覽doc，docx轉換pdf（一）
1. 前言
文檔轉換是一個是一塊硬骨頭，但是也是必不可少的，我們正好做的知識庫產品中，也面臨着同樣的問題，文檔轉換，精准的全文搜索，知識的轉換率，是知識庫產品的基本要素，初識閱讀時同時絞盡腦汁，自己開發?，集成第三方？都是中小企業面臨的一大難題…….
自己在網上搜索着找到poi開源出來的很多例子，最開始是用poi把所有文檔轉換為html，
1) 在github上面找到一個https://github.com/litter-fish/transform完整的demo，你想要的轉換基本都提供，初學者可以參照實現轉換出來的基本樣子，達到通用級別，需要自己花很多功夫。此開源代碼是基於poi和itext（pdf）的轉換方式。
2) https://gitee.com/kekingcn/file-online-preview這是開源中國提供的一個源碼，基於jodconverter，原理是調用windows，另存為的組件，實現轉換。
3) 收費產品例如【永中office】【office365】【idocv】、【https://downloads.aspose.com/words/java】

2. 轉換思路
自己在嘗試過很多后，也與永中集成了文檔轉換，發現，要想完成預覽的品質，必須的做二次渲染。畢竟永中做了十幾年文檔轉換我們不能比的，自己琢磨后，發現一個勉強靠譜的思路，doc和docx都轉換為pdf實現預覽。都是在基於poi的基礎上。
2.1. Doc轉換pdf
1) Doc轉換為xml

/**
	 * doc轉xml
	 */
	public String toXML(String filePath){
		
	try{
		
		POIFSFileSystem nPOIFSFileSystem = new POIFSFileSystem(new File(filePath));

		HWPFDocument nHWPFDocument = new HWPFDocument(nPOIFSFileSystem);
		WordToFoConverter nWordToHtmlConverter = new WordToFoConverter(
				DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
		PicturesManager nPicturesManager = new PicturesManager() {
			
			public String savePicture(byte[] arg0, PictureType arg1,String arg2, float arg3, float arg4) {
				//file:///F://20.vscode//iWorkP//temp//images//0.jpg
				//System.out.println("file:///"+PathMaster.getWebRootPath()+ java.io.File.separator + "temp"+java.io.File.separator+"images" + java.io.File.separator + arg2);
//				return  "file:///"+PathMaster.getWebRootPath()+java.io.File.separator +"temp"+java.io.File.separator+"images" + java.io.File.separator + arg2;
				return  "file:///"+PathMaster.getWebRootPath()+java.io.File.separator +"temp"+java.io.File.separator+"images" + java.io.File.separator + arg2;
			}
		};

		nWordToHtmlConverter.setPicturesManager(nPicturesManager);
		nWordToHtmlConverter.processDocument(nHWPFDocument);
		String nTempPath = PathMaster.getWebRootPath()  + java.io.File.separator + "temp" + java.io.File.separator + "images" + java.io.File.separator;
		File nFile = new File(nTempPath);
		
		if (!nFile.exists()) {
			nFile.mkdirs();
		}
		for (Picture nPicture : nHWPFDocument.getPicturesTable().getAllPictures()) {
			nPicture.writeImageContent(new FileOutputStream(nTempPath + nPicture.suggestFullFileName()));
		}
		Document nHtmlDocument = nWordToHtmlConverter.getDocument();
		OutputStream nByteArrayOutputStream = new FileOutputStream(OUTFILEFO);
		DOMSource nDOMSource = new DOMSource(nHtmlDocument);
		StreamResult nStreamResult = new StreamResult(nByteArrayOutputStream);

		
		TransformerFactory nTransformerFactory = TransformerFactory.newInstance();
		Transformer nTransformer = nTransformerFactory.newTransformer();
		
		nTransformer.setOutputProperty(OutputKeys.ENCODING, "GBK");
		nTransformer.setOutputProperty(OutputKeys.INDENT, "YES");
		nTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
		
		nTransformer.transform(nDOMSource, nStreamResult);

		nByteArrayOutputStream.close();

		return "";
		
		}catch(Exception e){
			e.printStackTrace();
		}
		return "";
	}

2) Xml轉換為pdf
這里我是使用fop通過xml轉換為pdf，也是最近欣喜的一個發現，poi官網推薦的我一直沒去仔細看，里面的架包和永中的很多高清包，一模一樣，現在貌似路子對了。有興趣者研究去吧。我的源碼已經在githubhttps://github.com/liuxufeijidian/file.convert.master/tree/master上面，環境已經配置好，需要准備好doc和docx文檔即可。

/*
	 * xml 轉pdf
	 */
	public void xmlToPDF() throws SAXException, TransformerException{
		// Step 1: Construct a FopFactory by specifying a reference to the configuration file
		// (reuse if you plan to render multiple documents!)
		FopFactory fopFactory = null;
		new URIResolverAdapter(new URIResolver(){
			public Source resolve(String href, String base) throws TransformerException {
				try {
		            URL url = new URL(href);
		            URLConnection connection = url.openConnection();
		            connection.setRequestProperty("User-Agent", "whatever");
		            return new StreamSource(connection.getInputStream());
		        } catch (IOException e) {
		            throw new RuntimeException(e);
		        }
			}
		});
		OutputStream out = null;
		try {
			
			fopFactory = FopFactory.newInstance(new File(CONFIG));
			
			// Step 2: Set up output stream.
			// Note: Using BufferedOutputStream for performance reasons (helpful with FileOutputStreams).
			
			out = new BufferedOutputStream(new FileOutputStream(OUTFILEPDF));
		    
			// Step 3: Construct fop with desired output format
		    Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);

		    // Step 4: Setup JAXP using identity transformer
		    TransformerFactory factory = TransformerFactory.newInstance();
		    Transformer transformer = factory.newTransformer(); // identity transformer
		    
		    // Step 5: Setup input and output for XSLT transformation
		    // Setup input stream
		    Source src = new StreamSource(OUTFILEFO);

		    // Resulting SAX events (the generated FO) must be piped through to FOP
		    Result res = new SAXResult(fop.getDefaultHandler());
		    // Step 6: Start XSLT transformation and FOP processing
		    transformer.transform(src, res);

		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
		    //Clean-up
		    try {
				out.close();
			} catch (IOException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}}

2.1.3
很多時候我們是使用word直接轉的html，但是需要自己寫二次渲染的代碼，較為復雜，我是使用迂回方法，doc轉xml，再用xml轉換pdf，轉換出來的pdf用pdfjs渲染即可實現和瀏覽器打開一樣的預覽，pdfjs預覽方法詳情見https://blog.csdn.net/liuxufeijidian/article/details/82260199

ending：大家都想看效果如何，https://github.com/litter-fish/transform，github獲取改源碼，配置好doc和docx文檔即可實現轉換，接下來會繼續努力不間斷優化和更新文檔轉換。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 前端實現在線預覽pdf、docx、xls、ppt等文件【 React-H5】在線預覽word,pdf,docx格式的文件 java使用poi轉換doc/docx為pdf python3將docx轉換成pdf,html文件,pdf轉doc文件 [原]office(doc,xls,txt,pdf,ppt)文檔在線預覽及轉換(office2pdf) - PHP版 office文檔在線預覽 (doc、docx、ppt、pptx、xls、xlsx) 實戰動態PDF在線預覽及帶簽名的PDF文件轉換 java 使用openoffice將doc、docx、ppt、pptx等轉換pdf格式文件 15個最好的PDF轉word的在線轉換器，將PDF文件轉換成doc文件前端實現文件在線預覽txt，pdf，doc，xls，ppt幾種格式