centos7下安裝tesseract-ocr進行驗證碼識別

本文轉載自查看原文 2017-10-12 11:53 4839 開發環境搭建/ NLP

摘要：

　　centos7安裝依賴庫

　　tesseract配置

　　代碼例子

centos7安裝依賴庫

安裝centos系統依賴

yum install -y automake autoconf libtool gcc gcc-c++ yum install -y libpng-devel libjpeg-devel libtiff-devel

安裝leptonica

wget http://www.leptonica.org/source/leptonica-1.72.tar.gz tar xvzf leptonica-1.72.tar.gz cd leptonica-1.72/ ./configure make && make install

安裝tesseract-ocr

wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip unzip 3.04.zip cd tesseract-3.04/ ./configure make && make install sudo ldconfig

部署模型
- 在https://github.com/tesseract-ocr/tessdata 下載對應語言的模型文件
- 將模型文件移動到/usr/local/share/tessdata
安裝requirements.txt中的python依賴庫
```
pip install -r requirements.txt
```

tesseract配置

在/usr/local/share/tessdata創建eng.user-patterns寫入
```
\n\n\n\n\n\n
```
表示識別6位字符（或數字）

在/usr/local/share/tessdata/configs創建myconfig寫入

#識別白名單 tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz0123546789 #用戶正則模式匹配 user_patterns_suffix user-patterns

psm參數說明

-psm N Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are: 0 = Orientation and script detection (OSD) only. 1 = Automatic page segmentation with OSD. 2 = Automatic page segmentation, but no OSD, or OCR. 3 = Fully automatic page segmentation, but no OSD. (Default) 4 = Assume a single column of text of variable sizes. 5 = Assume a single uniform block of vertically aligned text. 6 = Assume a single uniform block of text. 7 = Treat the image as a single text line. 8 = Treat the image as a single word. 9 = Treat the image as a single word in a circle. 10 = Treat the image as a single character.

代碼例子

1 import pytesseract
2 from PIL import Image
3 
4 image = Image.open('code.png')
5 code = pytesseract.image_to_string(image)
6 print code

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 centos7下安裝tesseract-ocr進行驗證碼識別,centos7 安裝tesseract ,yum安裝tesseract CENTOS 下編譯安裝 tesseract-ocr 3.0.4 識別文字性能測試中使用tesseract-ocr工具來識別驗證碼的一些想法 Linux（CentOS）下安裝tesseract-ocr以及配置依賴leptonica 在linux下安裝tesseract-ocr Java使用Java OCR API進行驗證碼識別使用Tesseract （OCR）實現簡單的驗證碼識別（C#）+窗體淡入淡出效果使用Tesseract （OCR）實現簡單的驗證碼識別（C#）+窗體淡入淡出效果 Mac上tesseract-OCR的安裝配置 C# 使用Tesseract-OCR-v5.0，實現驗證碼，中文，身份證識別