python-使用內置庫pytesseract實現圖片驗證碼的識別

本文轉載自查看原文 2019-11-06 11:40 622 python自動化測試

環境准備：

1、安裝Tesseract模塊

git文檔地址：https://digi.bib.uni-mannheim.de/tesseract/

　百度網盤下載地址：

　　鏈接：https://pan.baidu.com/s/16RoJ19WynWOKI4Zpr0bKzA
　　提取碼：5hst

下載后右擊安裝即可

2、配置環境變量：

　　編輯系統變量里面 path，添加下面的安裝路徑：D:\Program Files\Tesseract-OCR(填寫自己的實際安裝路徑)

3、安裝python的第三方庫：　　

　　pip install pillow #一個python的圖像處理庫，pytesseract依賴
　　pip install pytesseract

4、修改pytesseract.py文件，指定tesseract.exe安裝路徑

編輯pytesseract.py文件(此步驟必須做，否則運行代碼時會報錯)：

tesseract_cmd = 'D:\Program Files\Tesseract-OCR'

代碼實現

驗證碼識別方法之一，簡單驗證碼，代碼可直接使用

import requests
from PIL import Image
import pytesseract

# 驗證碼地址
url = "http://cloud.xxxx.com/checkCode?0.7337270680854053"
response = requests.get(url).content
#將圖片寫入文件
with open('test.png','wb') as f:
    f.write(response)
#識別驗證碼
#第一步：通過內置模塊PIL打開文件
image = Image.open('test.png')
image = image.convert('L')  #轉化為灰度圖
threshold = 160             #設定的二值化閾值
table = []                  #table是設定的一個表，下面的for循環可以理解為一個規則，小於閾值的，就設定為0，大於閾值的，就設定為1
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

image = image.point(table,'1')  #對灰度圖進行二值化處理，按照table的規則（也就是上面的for循環）
image.show()
result = pytesseract.image_to_string(image) #對去噪后的圖片進行識別
print('圖片內容為:',result)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用python內置庫pytesseract實現圖片驗證碼的識別 Python之selenium+pytesseract 實現識別驗證碼自動化登錄腳本 Tesseract-ocr視覺學習-驗證碼識別及python import pytesseract使用用pytesseract識別驗證碼報錯 Centos上安裝tesseract+pytesseract用來做圖片驗證碼的識別圖片驗證碼自動識別，使用tess4j進行驗證碼自動識別(java實現) Python驗證碼識別安裝Pillow、tesseract-ocr與pytesseract模塊的安裝以及錯誤解決 Selenium&Pytesseract模擬登錄+驗證碼識別使用pytesseract識別驗證碼中遇到tesseract is not installed or it's not in your path解決方案 python識別驗證碼