Python之驗證碼識別功能

本文轉載自查看原文 2019-12-19 16:34 2155

Python之pytesseract 識別驗證碼

1、驗證碼來一個

2、適合什么樣的驗證碼呢？

只能識別簡單、靜態、無重疊、只有數字字母的驗證碼

3、實際應用：模擬人工登錄、頁面內容識別、爬蟲抓取信息

步驟一：

下載工具Tesseract-OCR，下載地址https://digi.bib.uni-mannheim.de/tesseract/，下載成功后，傻瓜式安裝在英文路徑下

安裝后或出現一個目錄：D:\syspath\tesseract\Tesseract-OCR，將安裝路徑配置環境變量

步驟二：

添加tessdata系統變量，要注意是系統變量：名稱TESSDATA_PREFIX；路徑D:\syspath\tesseract\Tesseract-OCR\tessdata

dos命令窗口輸入：tesseract --version 展示版本信息后說明OK

驗證圖片識別：

同樣dos命令窗口：tesseract C:\a.png D:\1.txt

a.png是圖片，1.txt是識別后數據存放文件

如何識別頁面上的信息並抓取信息

安裝pytesserac模塊：pip install pytesserac
修改pytesseract.py文件中tesseract_cmd字段信息
例如：將tesseract_cmd = 'tesseract'修改為：tesseract_cmd = 'D:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
這樣就安裝好pytesserac模塊了
使用：
一個獲取登陸頁面驗證碼的腳本#-*- coding = utf - 8 -*-


import time , sys
import pytesseract
from PIL import Image,ImageEnhance
from selenium import webdriver
screenImg = "D:/checkcode.png"
driver = webdriver.Chrome
driver.get(https://192.168.1.1:8080/login)
driver.maximize_window()
driver.save_screenshot(screenImg)  # 截取當前網頁，該網頁有我們需要的驗證碼
location = self.param1.find_element_by_xpath("//img[@class='check-code']").location 
size = self.param1.find_element_by_xpath("//img[@class='check-code']").size
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
img = Image.open(screenImg).crop((left, top, right, bottom))
img = img.convert('L')  # 轉換模式：L | RGB
img = ImageEnhance.Contrast(img)  # 增強對比度
img = img.enhance(2.0)  # 增加飽和度
img.save(screenImg)
# 讀取截圖
img = Image.open(screenImg)
# 獲取驗證碼
code = pytesseract.image_to_string(img)
# 打印驗證碼
print(list(code))


以上是抓取一個頁面的驗證碼，那么同樣可以使用在爬蟲上面去
樓主萌新，不喜勿噴   哈哈哈

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python之驗證碼截取與驗證碼識別 python3圖片驗證碼識別 Python 代碼實現驗證碼識別 python網絡爬蟲之如何識別驗證碼驗證碼識別 python 識別登錄驗證碼圖片功能的實現代碼（完整代碼） python random 模塊及驗證碼功能 python django 實現驗證碼的功能 python識別驗證碼——一般的數字加字母驗證碼識別 python 驗證碼識別示例（五）簡單驗證碼識別