centos下安裝:
1.安裝依賴 yum install -y autoconf automake libtool libjpeg libpng libtiff zlib libjpeg-devel libpng-devel libtiff-devel zlib-devel 2.安裝Leptonica wget http://www.leptonica.org/source/leptonica-1.76.0.tar.gz tar -zxvf leptonica-1.76.0.tar.gz cd leptonica-1.76.0 ./configure make && make install # 配置環境變量 etc/profile末尾添加 export LD_LIBRARY_PATH=$LD_LIBRARY_PAYT:/usr/local/lib export LIBLEPT_HEADERSDIR=/usr/local/include export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig . /etc/profile
3.安裝Tesseract-OCR wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.3.tar.gz tar -zxvf tesseract-4.0.0-beta.3.tar.gz cd tesseract-4.0.0-beta.3 ./autogen.sh ./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/include make && sudo make install
# 環境變量
TESSDATA_PREFIX=/usr/local/share/tessdata # linux
windows下安裝:
1.安裝tessersct https://digi.bib.uni-mannheim.de/tesseract/ 2.環境變量(語言庫位置) TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR\tessdata # windows
語言庫下載:
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files windows 放在安裝目錄的tessdata下 linux 放在/usr/local/share/tessdata,/usr/local/bin/tesseract --list-langs 命令可檢測已導入的語言包
python庫安裝:
pip3 install pillow # pytesseract依賴 pip3 install pytesseract
使用:
import pytesseract from PIL import Image # pytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe' # windows下,指向tesseract.exe
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract' # linux下,指向tesseract
res = pytesseract.image_to_string(Image.open('xx.jpg'),lang='chi_sim') # chi_sim 中文
print(res)