絕了，dddd帶帶弟弟OCR識別驗證碼

本文轉載自查看原文 2022-01-17 17:04 9635 python接口自動化

前言

上一篇介紹了通過 python 的 pytesserract 模塊進行識別驗證碼，但是他只能識別一些簡單的驗證碼，比如像這種。

遇到稍微復雜一點的驗證碼，就會識別不了。

那咋辦？

網上找了一圈，介紹了不同的第三方平台識別驗證碼，像百度 ocr 、打碼兔、超級鷹等，其中百度 ocr 呼聲最高。

鏈接： https://cloud.baidu.com/product/ocr_others/webimage

奈何去試了一下，上面 3T9Q 這個驗證碼都不能識別出來...

而且收費還不低。

算了，后面在 github 上找到一個開源庫，而且還很強大。這個庫叫 ddddocr ，
帶帶弟弟，
哲哥的博客地址：https://wenanzhe.com/

不單單可以學到技術，還可以學到不少的人生道理。左邊技術，右邊人生道理。

網友們對這個庫高度評價，

ddddorc 安裝使用

環境要求：

python <= 3.9
Windows/Linux/Macos..

以下是在windows上安裝

更新 pip

python -m pip install --upgrade pip -i https://pypi.douban.com/simple

不更新 pip 有可能會安裝失敗。

安裝 ddddocr

pip install ddddocr -i https://pypi.douban.com/simple

使用方法

# -*- coding:utf-8 -*-
import ddddocr                       # 導入 ddddocr
ocr = ddddocr.DdddOcr()              # 實例化
with open('002.png', 'rb') as f:     # 打開圖片
    img_bytes = f.read()             # 讀取圖片
res = ocr.classification(img_bytes)  # 識別
print(res)

單個圖片的識別：

多個圖片識別：

# -*- coding:utf-8 -*-
import ddddocr                       # 導入 ddddocr
ocr = ddddocr.DdddOcr()
for i in range(1, 4):
    with open(str(i) + '.png', 'rb') as f:
        img_bytes = f.read()
    res = ocr.classification(img_bytes)
    print(res)

有些大小寫還是不能識別出來。

封裝一下：

# -*- coding:utf-8 -*-
import ddddocr
ocr = ddddocr.DdddOcr()

def ddocr(file):
    try:
        with open(file, 'rb') as f:
            img_bytes = f.read()
        res = ocr.classification(img_bytes)
        return res
    except:
        print("獲取驗證碼失敗，請繼續！")

r = ddocr('3.png')
print(r)

結合摳圖一起使用，即獲取驗證碼圖片，然后用dddr 識別驗證碼。

from selenium import webdriver
import time
from PIL import Image
import ddddocr
ocr = ddddocr.DdddOcr()


# 摳圖
def matting():
    # 打開谷歌瀏覽器
    browser = webdriver.Chrome()
    # 打開網站首頁
    # browser.get("https://v3pro.houjiemeishi.com/PC/pages/login/login.html")
    browser.get("http://192.168.139.129:8081/jpress/admin/login")
    # 網頁最大化
    browser.maximize_window()
    # 登錄頁圖片
    picture_name1 = 'login'+'.png'
    # 保存第一張截圖
    browser.save_screenshot(picture_name1)
    # 定位元素
    ce = browser.find_element_by_id("captchaImg")
    # ce = browser.find_element_by_xpath('//*[@class="codeImg"]')
    # 打印元素位置、元素尺寸
    print(ce.location, ce.size)
    # 要摳驗證碼的圖，先獲取元素參數
    left = ce.location.get('x')
    top = ce.location.get('y')
    right = ce.size.get('width') + left
    height = ce.size.get('height') + top
    # 讀取剛才截的第一張圖
    im = Image.open(picture_name1)
    # 摳圖
    img = im.crop((left, top, right, height))
    # 驗證碼塊的圖片
    picture_name2 = 'code'+'.png'
    # 保存圖片
    img.save(picture_name2)
    time.sleep(5)
    browser.close()


# 通過 ddddocr 模塊識別驗證碼
def ddocr(file):
    try:
        with open(file, 'rb') as f:
            img_bytes = f.read()
        res = ocr.classification(img_bytes)
        return res
    except:
        print("獲取驗證碼失敗，請繼續！")


if __name__ == '__main__':
    print("摳圖")
    matting()
    print("識別")
    code = ddocr('code.png')
    print(code)

分界線----------------------------------------------

運行過程中，有可能會遇到這個問題。

ddddocr模塊的項目使用pyinstaller 打包后報錯 ImportError: Microsoft Visual C++ Redistributable for Visual Studio 2019 not installed on the machine.

解決辦法：
安裝Microsoft Visual C++ Redistributable 2019

https://aka.ms/vs/16/release/VC_redist.x64.exe

直接點擊就可以下載了，下載后直接安裝即可。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Ocr技術識別高級驗證碼驗證碼識別的免費 OCR PHP識別驗證碼-image-ocr 使用百度ocr接口識別驗證碼 Java使用Java OCR API進行驗證碼識別 Tesseract-OCR識別圖片驗證碼 nodeJS實現識別驗證碼（tesseract-ocr+GraphicsMagick） [Python][爬蟲]利用OCR技術識別圖形驗證碼驗證碼識別驗證碼識別