centos下安装:
1.安装依赖 yum install -y autoconf automake libtool libjpeg libpng libtiff zlib libjpeg-devel libpng-devel libtiff-devel zlib-devel 2.安装Leptonica wget http://www.leptonica.org/source/leptonica-1.76.0.tar.gz tar -zxvf leptonica-1.76.0.tar.gz cd leptonica-1.76.0 ./configure make && make install # 配置环境变量 etc/profile末尾添加 export LD_LIBRARY_PATH=$LD_LIBRARY_PAYT:/usr/local/lib export LIBLEPT_HEADERSDIR=/usr/local/include export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig . /etc/profile
3.安装Tesseract-OCR wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.3.tar.gz tar -zxvf tesseract-4.0.0-beta.3.tar.gz cd tesseract-4.0.0-beta.3 ./autogen.sh ./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/include make && sudo make install
# 环境变量
TESSDATA_PREFIX=/usr/local/share/tessdata # linux
windows下安装:
1.安装tessersct https://digi.bib.uni-mannheim.de/tesseract/ 2.环境变量(语言库位置) TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR\tessdata # windows
语言库下载:
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files windows 放在安装目录的tessdata下 linux 放在/usr/local/share/tessdata,/usr/local/bin/tesseract --list-langs 命令可检测已导入的语言包
python库安装:
pip3 install pillow # pytesseract依赖 pip3 install pytesseract
使用:
import pytesseract from PIL import Image # pytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe' # windows下,指向tesseract.exe
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract' # linux下,指向tesseract
res = pytesseract.image_to_string(Image.open('xx.jpg'),lang='chi_sim') # chi_sim 中文
print(res)