php實現word轉html文檔的例子
word文檔不適合放到網頁上了,如果我們要放到網頁中去是需要一個個復制了,如果你還在復制就out了,下文小編來為各位整理一篇php實現word轉html文檔的例子,希望文章對各位有幫助。
要想完美解決,office轉pdf或者html,最好還是用windows office軟件,libreoffice不能完美轉換,wps沒有api。
先確認com模塊是不是開啟,phpinfo里面如果有com_dotnet模塊,說明已開啟,如果沒有,修改PHP.ini,
com.allow_dcom = true
前面的注釋去掉,重啟就OK了,php官方網站說,php5.4.5之前,com模塊是內置的,其實也不一定全是,官網下的php 5.3.39,com模塊就沒有內置。
如果不是內置模塊的話,php.ini加上,前提你的ext文件夾下,有該擴展
extension=php_com_dotnet.dll
然后重啟就OK了
- function word2html($wordname,$htmlname)
- {
- $word = new COM("word.application") or die("Unable to instanciate Word");
- $word->Visible = 1;
- $word->Documents->Open($wordname);
- $word->Documents[1]->SaveAs($htmlname,8);
- $word->Quit();
- $word = null;
- unset($word);
- }
- word2html('D:/www/test/6.docx','D:/www/test/6.html');
注意:
1,轉換出來的html,查看源碼,比較亂的
2,轉換過程中會調用winword.exe
3,如果頁面一直在加載,把文檔重命名,然后在重新轉。
補充一個例子
- function lego_clean($text) {
- $text = implode("\r",$text);
- // normalize white space
- $text = eregi_replace("[[:space:]]+", " ", $text);
- $text = str_replace("> <",">\r\r<",$text);
- $text = str_replace("<br>","<br>\r",$text);
- // remove everything before <body>
- $text = strstr($text,"<body");
- // keep tags, strip attributes
- $text = ereg_replace("<p [^>]*BodyTextIndent[^>]*>([^\n|\n\015|\015\n]*)</p>","<p>\\1</p>",$text);
- $text = eregi_replace("<p [^>]*margin-left[^>]*>([^\n|\n\015|\015\n]*)</p>","<blockquote>\\1</blockquote>",$text);
- $text = str_replace(" ","",$text);
- //clean up whatever is left inside <p> and <li>
- $text = eregi_replace("<p [^>]*>","<p>",$text);
- $text = eregi_replace("<li [^>]*>","<li>",$text);
- // kill unwanted tags
- $text = eregi_replace("</?span[^>]*>","",$text);
- $text = eregi_replace("</?body[^>]*>","",$text);
- $text = eregi_replace("</?div[^>]*>","",$text);
- $text = eregi_replace("<\![^>]*>","",$text);
- $text = eregi_replace("</?[a-z]\:[^>]*>","",$text);
- // kill style and on mouse* tags
- $text = eregi_replace("([ \f\r\t\n\'\"])style=[^>]+", "\\1", $text);
- $text = eregi_replace("([ \f\r\t\n\'\"])on[a-z]+=[^>]+", "\\1", $text);
- //remove empty paragraphs
- $text = str_replace("<p></p>","",$text);
- //remove closing </html>
- $text = str_replace("</html>","",$text);
- //clean up white space again
- $text = eregi_replace("[[:space:]]+", " ", $text);
- $text = str_replace("> <",">\r\r<",$text);
- $text = str_replace("<br>","<br>\r",$text);
- }