ubuntu 安裝 pytesseract 模塊進行圖片內容識別

本文轉載自查看原文 2020-01-23 09:50 876 ubuntu

主要是實現圖片內容的離線識別，python 提供了一個庫完成此功能。

一. 安裝 tesseract-ocr 包

sudo apt-get install tesseract-ocr

二. 安裝 PIL PIL(python imaging library)是python中的圖像處理庫

 sudo apt-get install python-imaging

三. 安裝 pytesseract

pip install pytesseract

四.代碼測試

# -*- coding: UTF-8 -*-
from PIL import Image
import pytesseract
# 識別中文
text = pytesseract.image_to_string(Image.open('chinese.png'),lang='chi_sim')
print text

# 識別英文
text = pytesseract.image_to_string(Image.open('english.png'))
print text

五.要想識別的中文需要添加中文字庫

需要在ubuntu 系統中找到 tessdata 文件夾把中文字庫放進去

也可以在線安裝中文字庫

sudo apt-get install tesseract-ocr-chi-sim

六.此模塊還支持命令行識別

使用命令：
識別英文：
tesseract e.png 1   #1 是存儲獲取內容的文件，會在本地生成一個1文件
識別中文
tesseract --help  # 查看幫助
tesseract --list -langs  # 查看是否安裝了中文庫chi_sim
tesseract -l chi_sim c.png 1 # 1也是結果的文件把識別的結果存到此文件中

還可以離線安裝源碼編譯安裝參考的教程

https://www.cnblogs.com/yanhai307/p/10791490.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python3使用 pytesseract 進行圖片識別圖片識別文字 pytesseract安裝及使用 pytesseract+Tesseract-OCR圖片文字識別 Python驗證碼識別安裝Pillow、tesseract-ocr與pytesseract模塊的安裝以及錯誤解決使用pytesseract進行圖像識別 Python 進行 OCR識別 -- pytesseract庫 pytesseract提取識別圖片中的文字 Centos上安裝tesseract+pytesseract用來做圖片驗證碼的識別 python3光學字符識別模塊tesserocr與pytesseract python3使用pytesseract進行驗證碼識別