PHP在linux讀取word文檔

本文轉載自查看原文 2017-07-01 10:48 2571 linux

幾天幫朋友解決一個技術問題，在Linux下，將word文檔中的內容讀取，然后使用正則匹配，拼成sql入庫

查閱了外文資料和google之后，步驟如下：

#wget http://www.winfield.demon.nl/linux/antiword-0.37.tar.gz
#tar zxvf antiword-0.37.tar.gz
#cd antiword-0.37
#make
#make install

antiword
cp /root/bin/*antiword /usr/local/bin/
mkdir /usr/share/antiword
cp -R /root/.antiword/* /usr/share/antiword/
chmod 777 /usr/local/bin/*antiword
chmod 755 /usr/share/antiword/*

安裝完成之后，如果要在web上查看的話，需要使用root執行 make global_install

    <?php  
    header("Content-type: text/html; charset=utf-8");  
      
      
    $filename = 'test.doc';  
    #$content = shell_exec('antiword '.$filename);  
    $content = shell_exec('antiword -mUTF-8 '.$filename);   
      
      
    echo '<pre>';  
    print_r ($content);  
    echo '</pre>';

#coding=utf-8
#usage python <script_name> <docFilePath>
#pip install python-docx [安裝一下擴展庫]
import sys
import os

from docx import Document

#獲取當前腳本得名稱
argv0_list = sys.argv[0].split("\\");
script_name = argv0_list[len(argv0_list) - 1]; 
usage = "\n Usage python <"+script_name+"> <docFilePath>"

if len(sys.argv) != 2:
	print "Warning:\n docx file is empty" + usage
	sys.exit()
docx_path = sys.argv[1]
if not os.path.exists(docx_path):
	print "Warning:\n docx file is not exist" + usage
        sys.exit()

#打開文檔
document = Document(docx_path)
#讀取每段資料
l = [ paragraph.text.encode('utf8') for paragraph in document.paragraphs];
#輸出並觀察結果，也可以通過其他手段處理文本即可
for i in l:
    print i
#讀取表格材料，並輸出結果
tables = [table for table in document.tables];
for table in tables:
    for row in table.rows:
        for cell in row.cells:
            print cell.text.encode('utf8'),'\t',

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 php 如何寫入、讀取word，excel文檔 [php]在PHP中讀取和寫入WORD文檔的代碼 linux下如何用php讀取word PHP讀取word docx文檔內容及處理圖片 poi 讀取word文檔 python讀取word文檔 POI讀取word文檔 poi讀取word文檔讀取Word文檔中的表格 Python讀取word文檔內容