一.tesseract-OCR的介紹

1.tesseract-OCR是一個開源的OCR引擎，能識別100多種語言，專門用於對圖片文字進行識別，並獲取文本。但是它的缺點是對手寫的識別能力比較差。
2.用tesseract可以識別的圖片中字體，主要有以下一些特點:

使用一個標准字體
可以使用復印或者拍照，但是必須字體要清晰，沒有痕跡
圖片里沒有歪歪斜斜的字體
另外沒有超出圖片中的字體，也沒有殘缺的字體

二. mac tesseract-OCR的安裝

1.安裝有四種方式：

brew install --with-training-tools tesseract //安裝tesseract，同時安裝訓練工具
brew install --all-languages tesseract //安裝tesseract，同時它還會安裝所有語言
brew install --all-languages --with-training-tools tesseract //安裝附加組件
brew install tesseract //安裝tesseract，但是不安裝訓練工具，我選擇這種方式進行安裝

2.安裝完tesseract后，進行測試:

tesseract -v
tesseract的安裝路徑為：/usr/local/Cellar/tesseract/4.0.0/

3.tesseract命令基本用法

tesseract 9.jpg result //result是輸出文件

4.下載語言庫這里可以根據自己的需求來下載所需要的語言庫，例如chi_sim.traineddata為簡體中文：
下載地址：https://github.com/tesseract-ocr/tessdata
將chi_sim.traineddata下載后，需要將它放在/usr/local/Cellar/tesseract/4.0.0/share/tessdata目錄下。

三. mac pytesseract的安裝

1.

python有着更加方便的方式調用tesseract，首先需要安裝pytesseract模塊

2. 下載的命令

pip install pytesseract
pytesseract安裝路徑：/usr/local/lib/python3.7/site-packages/pytesseract

3.pytesseract模塊要與PIL一起使用

4.實例1：

from PIL import Image
import pytesseract

if __name__ == '__main__':
    text = pytesseract.image_to_string(Image.open('9.jpg'), lang='chi_sim')
    print(text)

運行結果：
在這里插入圖片描述

原文：https://blog.csdn.net/wodedipang_/article/details/84585914

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Tesseract pytesseract的安裝和使用使用Pytesseract+Tesseract-OCR識別圖片的簡單步驟 tesseract-OCR + pytesseract安裝 tesseract-ocr,tesseract,pytesseract在windows下怎么安裝 Tesseract的簡單使用 tesseract的簡單使用 Mac上安裝tesseract-OCR tesseract 安裝及使用 Python驗證碼識別安裝Pillow、tesseract-ocr與pytesseract模塊的安裝以及錯誤解決 pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path