推文:Python驗證碼識別 安裝Pillow、tesseract-ocr與pytesseract模塊的安裝以及錯誤解決
一:依賴環境安裝
pip install Pillow
pip3 install pytesseract
二:安裝tesseract-ocr
(一)介紹
其中pytesseract會直接調用tesseract模塊,我們需要進行安裝
不然可會報錯
pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path
(二)下載地址
github地址: https://github.com/tesseract-ocr/tesseract
(三)下載traineddata訓練數據
github地址:https://github.com/tesseract-ocr/tessdata
注意:我們還是要將其設置環境變量
pytesseract.TesseractError: (1, 'Error opening data file \\OtherEnv\\tesseract-Win32\\tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
現在我們重新啟用cmd命令行,可以在cmd命令行調用python文件,獲取到驗證數據
但是我們在PyCharm中使用時還是需要修改python文件
還有在我們的程序文件中加入環境變量
os.environ['TESSDATA_PREFIX'] = "C:/OtherEnv/tesseract-Win32/tessdata"
三:代碼實現
import cv2 as cv import numpy as np from PIL import Image import os import pytesseract as tess os.environ['TESSDATA_PREFIX'] = "C:/OtherEnv/tesseract-Win32/tessdata" def recognize_text(image): gray = cv.cvtColor(image,cv.COLOR_BGR2GRAY) ret,binary = cv.threshold(gray,0,255,cv.THRESH_BINARY_INV|cv.THRESH_OTSU) kernel = cv.getStructuringElement(cv.MORPH_RECT,(1,2)) mid1 = cv.morphologyEx(binary,cv.MORPH_OPEN,kernel) kernel = cv.getStructuringElement(cv.MORPH_RECT, (2,1)) open_out = cv.morphologyEx(mid1, cv.MORPH_OPEN, kernel) cv.imshow("bin1",open_out) cv.bitwise_not(open_out,open_out) #變白色背景 textImage = Image.fromarray(open_out) text = tess.image_to_string(textImage) print("result:%s"%text) src = cv.imread("./y4.png") #讀取圖片 cv.namedWindow("input image",cv.WINDOW_AUTOSIZE) #創建GUI窗口,形式為自適應 cv.imshow("input image",src) #通過名字將圖像和窗口聯系 recognize_text(src) cv.waitKey(0) #等待用戶操作,里面等待參數是毫秒,我們填寫0,代表是永遠,等待用戶操作 cv.destroyAllWindows() #銷毀所有窗口