pytesseract+Tesseract-OCR圖片文字識別

本文轉載自查看原文 2020-03-07 15:44 1226

要識別的圖片：

代碼：

from PIL import Image
import pytesseract
text=pytesseract.image_to_string(Image.open('denggao.jpeg'),lang='chi_sim')
print(text)

效果截圖：

主要步驟：

1.需要兩個庫：pytesseract和PIL

（1）可以通過命令行安裝

pip install PIL 
pip install pytesseract

（2）如果你用的pycharm編輯器，就可以直接借助pycharm實現快速安裝。
在pycharm的Settings設置頁按照下面步驟操作：

可以通過同樣的步驟安裝PIL

2.安裝識別引擎tesseract-ocr

https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.0.0-alpha.20200223.exe

3.識別中文，單獨安裝上識別引擎是無法識別中文的，需要另外下載一些東西

https://github.com/tesseract-ocr/tessdata

將里面的 chi_sim.traineddata、chi_sim_vert.traineddata、chi_tra.traineddata和chi_tra_vert.traineddata文件放入tesseract-ocr的tessdata目錄下

4.修改pytesseract.py

到你的Python的Lib\site-packages\pytesseract中找到pytesseract.py並修改其中的tesseract_cmd

修改為：

tesseract_cmd = 'D:/Tesseract-OCR/tesseract.exe'

之后就可以運行了。

配置過程中遇到的錯誤：

1.沒有安裝識別引擎會報這個錯誤：

2.識別引擎版本不對會報：

pytesseract.pytesseract.TesseractError: (1, "Error, unknown command line argument '-psm'")

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用Pytesseract+Tesseract-OCR識別圖片的簡單步驟 Tesseract OCR 圖片文字識別圖片文字OCR識別-tesseract-ocr 開源圖片文字識別引擎——Tesseract OCR Tesseract-OCR-03-圖片文字識別 Tesseract Ocr文字識別使用python的pytesseract調用谷歌tesseract-ocr識別中英文字符 tesseract 4.0 ocr圖像識別利器，可識別文字。圖片越高清越准確 tesseract-OCR + pytesseract安裝基於Tesseract實現圖片文字識別