python圖片識別

本文轉載自查看原文 2018-07-09 21:56 1861 python

python 圖像處理模塊
1. 安裝 pytesseract模塊是會自動安裝Pillow模塊。
pillow 為標准圖像處理庫

手冊地址 http://pillow-cn.readthedocs.io/zh_CN/latest/index.html
pytesseract 模塊用於文字識別
pip3 install pytesseract
2. 安裝 tesseract-ocr 這個用於文字識別
pytesseract 需要調用它
https://github.com/tesseract-ocr/tesseract/wiki
參考：https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014320027235877860c87af5544f25a8deeb55141d60c5000
https://blog.csdn.net/dcba2014/article/details/78969658
https://blog.csdn.net/iodjSVf8U1J7KYc/article/details/79308086
3. 常見錯誤：
1. 注意使用python版本和安裝模塊的版本
2. ImageOps 需要使用 from PIL import ImageOps
不能直接使用PIL.ImageOps
3. 先引入
from lxml import html
from pyquery import PyQuery as pq
在引入
# 圖片識別
from PIL import ImageOps
from PIL import Image
import pytesseract
發現報錯誤OSError: codec configuration error when reading image file
問題感覺比較奇葩
解決：將圖片庫的引入在 pqquery 之前

例子1

轉自：https://www.cnblogs.com/MrRead/p/7656800.html 有簡單修改讓代碼能在python3上運行

1、驗證碼的識別是有針對性的，不同的系統、應用的驗證碼區別有大有小，只要處理好圖片，利用好pytesseract，一般的驗證碼都可以識別

2、我在識別驗證碼的路上走了很多彎路，重點應該放在怎么把圖片處理成這個樣子，方便pytesseract的識別，以提高成功率

3、原圖為：

# 圖片識別
from PIL import ImageOps
from PIL import Image
import pytesseract

def initTable(threshold=140):
table = []
for i in range(256):
if i < threshold:
table.append(0)
else:
table.append(1)
return table

im = Image.open('8fnp.png')
#圖片的處理過程
im = im.convert('L')
binaryImage = im.point(initTable(), '1')
im1 = binaryImage.convert('L')
im2 = ImageOps.invert(im1)
im3 = im2.convert('1')
im4 = im3.convert('L')
#將圖片中字符裁剪保留
box = (30,10,90,28)
region = im4.crop(box)
#將圖片字符放大
out = region.resize((120,38))
asd = pytesseract.image_to_string(out)
print(asd)
print (out.show())

上面代碼可以識別出圖片驗證碼

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python 圖片文字識別 Python圖片識別找坐標（appium通過識別圖片點擊坐標） python 使用tesseract進行圖片識別 python3圖片驗證碼識別 python 使用tesseract進行圖片識別 Python圖片識別——人工智能篇 python識別圖片上的文字並返回文字在圖片中的坐標圖片識別 python圖片二值化提高識別率