1.安裝pillow,pytesseract
pip install pillow
pip install pytesseract
2.識別驗證碼
def get_verifycode(self): '''識別驗證碼''' # 1.定位驗證碼位置及大小 verifycode_element = self.verifycode_image_element # 定位驗證碼 location = verifycode_element.location # 獲取驗證碼x,y坐標 size = verifycode_element.size # 驗證碼高度、寬度、 zuobiao = ( int(location['x']), int(location['y']), int(location['x'] + size['width']), int(location['y'] + size['height'])) # 2.截屏,在截屏中截取驗證碼位置,再次保存 image_name = self.save_screenshot() # 截屏 img = Image.open(image_name).crop(zuobiao) # 打開截圖 img = img.convert('RGB') img.save(image_name) # 3.再次讀取識別驗證碼 code = pytesseract.image_to_string(Image.open(image_name)) # 正則表達式去除空格或其他特殊符號 b = '' for i in code.strip(): # pattern = re.compile(r'[a-zA-Z0-9]') pattern = re.compile(r'[0-9]') # 由於本系統的驗證碼都是數字,所以正則匹配時,只驗證數字 m = pattern.search(i) if m != None: b += i return b
3.pytesseract模塊使用出現錯誤:tesseract is not installed or it's not in your path,處理方法:
1)下載tesseract-ocr:tesseract-ocr下載地址:https://github.com/tesseract-ocr/tesseract/wiki
2)安裝tesseract-ocr:雙擊.exe文件安裝,並記住安裝路徑
3)修改python安裝路徑中的pytesseract.py文件,將tesseract_cmd改為r'F:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
文件路徑:pyhton安裝路徑\Lib\site-packages\pytesseract\pytesseract.py