Tesseract pytesseract的安裝和使用

本文轉載自查看原文 2017-02-07 11:41 2241 tesseract/ pytesseract/ 圖片識別/ 采集/ OCR

Tesseract是開源的OCR引擎，可以識別的圖片里的文字，支持unicode（UTF-8）編碼，100多種語言，需要下載相應語言的訓練數據。

安裝：

有兩種方法，一種是通過編譯源碼，比較麻煩。我使用的是另外一種方法，在windows下，使用編譯好的二進制文件。

安裝文件下載地址：https://sourceforge.net/projects/tesseract-ocr-alt/files/

最新訓練數據下載地址：https://github.com/tesseract-ocr/tessdata

建議使用穩定的3.0版本，我試用的4.0開發版報錯。

注意選中Registry settings,也就是把Path和TESSDATA_PREFIX環境變量自動配置好。

如果要識別中文，就把中文訓練數據選中。

使用：

安裝完成之后，就可以在命令行下執行識別圖片了。

命令行下執行：

1 tesseract test.png stdout

都可以識別。

但是識別中文或者是中英文混合的時候，識別率不高。

tesseract cs.png stdout -l eng+chi_sim

Python封裝模塊pytesseract：

tesseract有很多語言的封裝包，這里只介紹下python的pytesseract。

源碼地址：https://github.com/madmaze/pytesseract

可以直接使用pip安裝：

pip install pytesseract

使用示例：

from PIL import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))

注意事項：

需要先安裝好PIL和tesseract，並且可以在命令行里可以使用。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 mac 安裝tesseract、pytesseract以及簡單使用 tesseract 安裝及使用圖片識別文字 pytesseract安裝及使用 pytesseract使用 tesseract-ocr的安裝及使用 tesseract-ocr pytesseract.image_to_string 參數說明 pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path && FileNotFoundError: [WinError 2] 系統找不到指定的文件。解決pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path問題 windows下python安裝pytesseract Tesseract-OCR 4.1.0 安裝和使用— windows及CentOS