Selenium識別驗證碼

本文轉載自查看原文 2019-11-19 12:29 563

最近項目組提了個需求要求我這邊幫他們實現一個網站的數據采集並對接到指定的數據庫表里面，記錄下使用的在線API識別驗證碼的過程：

由於驗證碼在每次加載頁面的時候都會刷新，也就是說每次打開登錄界面都是不同的驗證碼，所以需要將打開的登錄界面截圖然后從里面扣取驗證碼對應的內容再提交到服務器進行識別。

1、對登錄界面進行截圖

url = ''
driver = webdriver.PhantomJS()
driver.get(url)
driver.set_window_size(1200, 800) #此處一定要設置固定值，在其他的機器上面運行的時候可能會有問題
# 暫停10s確保登錄界面加載完成
time.sleep(10)
# 截取登錄界面的屏幕
screenshot_path = 'screenshot.png'
if os.path.exists(screenshot_path):
    os.remove(screenshot_path)
driver.save_screenshot(screenshot_path)

2、從截圖中扣取驗證碼

# 找到驗證碼元素，並獲取到位置坐標
element = driver.find_element_by_xpath("//*[@id=\"login-content\"]/div[2]/div[4]/img")
left = int(element.location['x'])
top = int(element.location['y'])
right = int(element.location['x'] + element.size['width'])
bottom = int(element.location['y'] + element.size['height'])
# 從截圖中摳出來驗證碼的區域
captcha_path = 'captcha.png'
if os.path.exists(captcha_path):
    os.remove(captcha_path)
img = Image.open(screenshot_path)
img = img.crop((left, top, right, bottom))
img.save(captcha_path)

3、調用在線API進行驗證碼識別

以下代碼來源於：聚合數據-驗證碼識別的示例代碼，具體的可以參考官方的文檔：

def captcha_recognition(appkey, codeType, imagePath):
    """
    調用驗證碼在線識別的API
   :param appkey: 平台申請的appkey
   :param codeType: 驗證碼類型
   :param imagePath: 驗證碼圖片路徑
   :return: 查詢結果
    """
    if not os.path.exists(imagePath):
        return ''
    captcha_result = ''
    submitUrl = 'http://op.juhe.cn/vercode/index'  # 接口地址
    # buld post body data
    boundary = '----------%s' % hex(int(time.time() * 1000))
    data = []
    data.append('--%s' % boundary)
    data.append('Content-Disposition: form-data; name="%s"\r\n' % 'key')
    data.append(appkey)
    data.append('--%s' % boundary)
    data.append('Content-Disposition: form-data; name="%s"\r\n' % 'codeType')
    data.append(codeType)
    data.append('--%s' % boundary)
    fr = open(imagePath, 'rb')
    data.append('Content-Disposition: form-data; name="%s"; filename="b.png"' % 'image')
    data.append('Content-Type: %s\r\n' % 'image/png')
    data.append(fr.read())
    fr.close()
    data.append('--%s--\r\n' % boundary)
    http_body = '\r\n'.join(data)
    try:
        req = urllib2.Request(submitUrl, data=http_body)
        req.add_header('Content-Type', 'multipart/form-data; boundary=%s' % boundary)
        req.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36')
        req.add_header('Referer', 'http://op.juhe.cn/')
        resp = urllib2.urlopen(req, timeout=60)
        qrcont = resp.read()
        result = json.loads(qrcont, 'utf-8')
        error_code = result['error_code']
        if (error_code == 0):
            data = result['result']
            captcha_result = data
        else:
            errorinfo = u"錯誤碼:%s,描述:%s" % (result['error_code'], result['reason'])
            print errorinfo
    except Exception as e:
        print e
    return captcha_result

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 selenium如何識別驗證碼驗證碼識別驗證碼識別使用python+selenium做驗證碼識別 Selenium&Pytesseract模擬登錄+驗證碼識別 Selenium之自動登錄（識別驗證碼版） selenium---識別輸入類型驗證碼 python+selenium識別驗證碼並登錄 python3爬蟲之驗證碼的識別——selenium自動識別驗證碼並點擊提交，附源代碼滑動驗證碼的識別