http://blog.csdn.net/u012566751/article/details/54094692 Tesseract-OCR入門使用1
http://blog.csdn.net/u012566751/article/details/54136836 Tesseract-OCR入門使用2
http://blog.csdn.net/u012566751/article/details/54141109 Tesseract-OCR入門使用3
https://github.com/tesseract-ocr/tesseract/wiki/APIExample Tesseract API Example
當前環境:win7,python3.6.0,pyCharm4.5。 python目錄是:c:/python3/
安裝:
一、安裝 tesseract 庫
cd c:/python3/Scripts/
pip install tesseract
二、裝程序:
https://github.com/UB-Mannheim/tesseract/wiki
這是非官方下載包,下載並安裝4.0: https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.0.0-alpha.20170804.exe
安裝時注意勾選簡體中文,默認安裝,安裝完畢后,敲命令(看看裝的怎么樣了,支持什么語言):
cd C:\Program Files (x86)\Tesseract-OCR
tesseract
tesseract -v
tesseract --list-langs #查看Tesseract-OCR支持語言
三、改文件:
C:\Python3\Lib\site-packages\pytesseract\pytesseract.py,找到這兩行:
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY tesseract_cmd = 'tesseract'
改為這樣:
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY #tesseract_cmd = 'tesseract' tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
四、pyCharm里運行,就可以進行文字識別了:
(先用畫圖,用微軟雅黑字體,寫幾個數字、和詩詞,保存成:ci.png)
from PIL import Image import pytesseract text = pytesseract.image_to_string(Image.open('ci.png'), lang='chi_sim') print(text)
...