python實現網站的自動登錄（selenium實現，帶驗證碼識別）

本文轉載自查看原文 2021-10-28 15:37 3828 自動化測試

一、前言

這是鄙人寫的第一篇博客，旨在總結一下近期所學，本文通過selenium工具實現工作所用網站的自動登錄，下圖為網站登錄界面。

1、運行環境

操作系統：Windows10
python版本：python3.7

2、需要的python第三方庫

1、selenium

安裝：pip install selenium

selenium是ThoughtWorks提供的一個強大的基於瀏覽器的開源自動化測試工具。支持的瀏覽器包括IE、Chrome和Firefox等。
另外還需要下載瀏覽器驅動，我這里用的是Google瀏覽器驅動。

Google瀏覽器驅動：官網地址 http://npm.taobao.org/mirrors/chromedriver/

選擇適配的瀏覽器版本和Windows系統，然后將下載得到的exe文件放到python的安裝目錄下。

2、baidu_api

安裝：pip install baidu_api

這里安裝錯了，應該是pip install baidu_aip

baidu_api是百度文字識別的OCR（Optical Character Recognition，光學字符識別），很多人可能會用tesseract庫，但以測試的效果來看，百度的API識別效果更好。了解此庫大家可以看官方文檔。百度OCR-API官方文檔

3、pillow

安裝：pip install pillow

pillow庫是python最常用的第三方圖像處理庫

二、代碼實現

1、導入第三方庫

 from selenium import webdriver
 from PIL import Image
 from aip import AipOcr

2、驗證碼的獲取與處理

在進行驗證碼的獲取之前，需要先實例化一個browser對象，代碼如下：

python browser = webdriver.Chrome() # 實例化對象

獲取驗證碼的方法是通過實例化后的browser對象的查找元素方法的得到驗證碼元素並截圖。
下為網站源代碼的驗證碼標簽。

代碼如下：

 url = 'http://120.77.44.123/Login/Login.aspx?title=exit&recode=1'
 browser.get(url)
 png = browser.find_element_by_id('captcha_img') # 查找驗證碼元素
 png.screenshot('capt.png') # 對驗證碼進行截圖並保存

這是獲得的驗證碼圖片：

為了提高驗證碼的識別率，我們需要用pillow庫對驗證碼進行圖像處理。
處理方法如下：
1、先將圖像轉換成灰度模式
2、通過對閾值的調整使得多余的噪點消失

  img = Image.open('capt.png')
  img = img.convert('L') # P模式轉換為L模式(灰度模式默認閾值127)
  count = 165 # 設定閾值
  table = []
  for i in range(256):
  if i < count:
  table.append(0)
  else:
  table.append(1)
 
 img = img.point(table, '1')
 img.save('captcha1.png') # 保存處理后的驗證碼

處理效果如下：

可以通多對閾值的調整使得程序獲得更高的識別率。

3、驗證碼的識別

處理完驗證碼后現在我們可以對其進行識別，調用baidu_api的通用文字識別接口，官方文檔上有詳細的調用方法，每日可以免費使用5000次，這個數字也夠揮霍了。

  # 識別碼
  APP_ID = '*** '
  API_KEY = '***'
  SECRET_KEY = '***'
  # 初始化對象
  client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
  # 讀取圖片
  def get_file_content(file_path):
  with open(file_path, 'rb') as f:
 return f.read()
 
 image = get_file_content('captcha.png')
 # 定義參數變量
 options = {'language_type': 'ENG', } # 識別語言類型，默認為'CHN_ENG'中英文混合
 # 調用通用文字識別
 result = client.basicGeneral(image, options) # 高精度接口 basicAccurate
 for word in result['words_result']:
 captcha = (word['words'])
 
 print('識別結果：' + captcha)
 
 return captcha

這是打印出的識別結果：

整體來看，識別效果還可以，當然這個驗證碼本身就不難。

4、自動鍵入信息並登錄

這里用到的依然是browser對象的元素查找方法，代碼如下：

 browser.find_element_by_id('j_username').send_keys('***') # 找到賬號框並輸入賬號
 browser.find_element_by_id('j_password').send_keys('***') # 找到密碼框並輸入密碼
 browser.find_element_by_id('j_captcha').send_keys(captcha) # 找到驗證碼框並輸入驗證碼
 browser.find_element_by_id('login_ok').click() # 找到登陸按鈕並點擊

代碼執行到這里，也就實現了網站的自動登錄，下面是程序的完整代碼。

  from selenium import webdriver
  from PIL import Image
  from aip import AipOcr
  
  # 驗證碼的獲取和處理
  def get_captcha():
      # 獲取驗證碼圖片
      url = 'http://120.77.44.123/Login/Login.aspx?title=exit&recode=1'
      browser.get(url)
     png = browser.find_element_by_id('captcha_img')
     png.screenshot('capt.png')  # 將圖片截屏並保存
     # 處理驗證碼
     img = Image.open('capt.png')
     img = img.convert('L')  # P模式轉換為L模式(灰度模式默認閾值127)
     count = 160  # 設定閾值
     table = []
     for i in range(256):
         if i < count:
             table.append(0)
         else:
             table.append(1)
 
     img = img.point(table, '1')
     img.save('captcha.png')  # 保存處理后的驗證碼
 

 # 驗證碼的識別
 def discern_captcha():
     # 識別碼
     APP_ID = '*** '
     API_KEY = '***'
     SECRET_KEY = '***'
     # 初始化對象
     client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
     # 讀取圖片
     def get_file_content(file_path):
         with open(file_path, 'rb') as f:
             return f.read()
 
     image = get_file_content('captcha.png')
     # 定義參數變量
     options = {'language_type': 'ENG', }  # 識別語言類型，默認為'CHN_ENG'中英文混合
     #  調用通用文字識別
     result = client.basicGeneral(image, options)  # 高精度接口 basicAccurate
     for word in result['words_result']:
         captcha = (word['words'])
 
         print('識別結果：' + captcha)
 
         return captcha
 
 
 # 登錄網頁
 def login(captcha):
     browser.find_element_by_id('j_username').send_keys('***')  # 找到賬號框並輸入賬號
     browser.find_element_by_id('j_password').send_keys('***')  # 找到密碼框並輸入密碼
     browser.find_element_by_id('j_captcha').send_keys(captcha)  # 找到驗證碼框並輸入驗證碼
     browser.find_element_by_id('login_ok').click()  # 找到登陸按鈕並點擊
 
 
 def get_file():
     browser.find_element_by_xpath('/html/body/header/div/nav/ul/li[6]/a').click()  # 找到文件按鈕並點擊
 
 
 if __name__ == '__main__':
     browser = webdriver.Chrome()  # 實例化對象
 
     get_captcha()
     captcha = discern_captcha()
     login(captcha)
     get_file()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 selenium識別登錄驗證碼---基於python實現 Python之selenium+pytesseract 實現識別驗證碼自動化登錄腳本 selenium實現登錄百度（自動識別簡單驗證碼） Python+selenium 實現驗證碼識別 Python Selenium Cookie 繞過驗證碼實現登錄 Python使用selenium實現網頁用戶名密碼驗證碼自動登錄功能 Jmeter—實現識別驗證碼登錄 Jmeter—實現識別驗證碼登錄 python+selenium，實現帶有驗證碼的自動化登錄功能 WebDriver中自動識別驗證碼--Python實現