月薪20K軟件測試自動化崗必問面試題：驗證碼識別與處理

本文轉載自查看原文 2018-05-31 14:46 1187 python

本文乃Happy老師的得意門生來自java全棧自動化測試4期的小核桃所作。正所謂嚴師出高徒，筆下有黃金~~讓我們一起來征服面試官吧~~

在做自動化測試的時候，經常會遇到需要輸入驗證碼的地方，有些可以讓開發屏蔽，但是有些不行，這時候，我們可以調用tesseract來實現圖像的識別。

在JAVA中調用tesseract，主要有兩種方式：cmd方式，tess4j方式。我要介紹的是用tess4j的方式來識別圖像，得到驗證碼。

首先要在工程中加入tess4j的jar包，如果是maven

項目，可以從中央倉庫中獲取https://mvnrepository.com/ 直接搜索tess4j

點擊打開

選擇使用比較多的，點進去

復制這段代碼粘貼到maven工程的pom.xml里面

等待下載完成

安裝完成之后，在Maven Dependencies庫中會出現tess4j的jar包，官方解釋tess4j：A Java JNA wrapper for Tesseract OCR API.

也就是說：tess4j是針對tesseract進行封裝的javaAPI。安裝好依賴庫之后，就不需要另外再安裝tessereact-ocr了，因為tess4j的jar包里面自帶了tessereact-ocr。

安裝好之后，如果沒有把文字庫tessdata放到項目中，調用的時候會報錯,如下

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

Could not initialize tesseract.

這里提示的是環境變量沒有設置，這是針對安裝tessreact-ocr的調用的錯誤提示，所以按照這個去加環境變量，問題還是會出現的（這里我折騰了好久才解決）。

針對依賴庫的方法調用，解決這個問題的正確做法是在maven項目的resources路徑下添加tessdata文字庫

eng.traineddata是英文語言包，識別字母和數字。

如果想要識別中文(數字 + 中文），需要在下載chi_sim.traineddata語言包。這樣tess4j就能正常使用了。

接下來是調用過程，要是別驗證碼，主要的步驟是得到驗證碼圖片，進行識別，輸出識別結果。

得到驗證碼圖片分為三步：

1、將驗證碼頁面截圖保存

public byte[] takeScreenshot(WebDriver driver) throws IOException {
byte[] screenshot = null;
screenshot = ((TakesScreenshot) driver)
.getScreenshotAs(OutputType.BYTES);//得到截圖
return screenshot;
}

2、得到的圖片是整個屏幕的截圖，我們可以處理一下，對圖片進行截取，只保留驗證碼那一部分

public BufferedImage createElementImage(WebDriver driver,
WebElement webElement, int x, int y, int width, int heigth)//開始裁剪的位置和截圖的寬和高
throws IOException {
Dimension size = webElement.getSize();
BufferedImage originalImage = ImageIO.read(new ByteArrayInputStream(
takeScreenshot(driver)));
BufferedImage croppedImage = originalImage.getSubimage(x, y,
size.getWidth() + width, size.getHeight() + heigth);//進行裁剪
return croppedImage;

3、tesseract讀取圖片，獲得驗證碼，默認是英文，如果要使用中文包，加上instance.setLanguage("chi_sim");

private String getVerificationCode(String path) {
File imageFile = new File(path);
try {
imageFile.createNewFile();
} catch (IOException e1) {
e1.printStackTrace();
}
WebElement element = driver.findElement(By
.cssSelector("img[id='codeImg']"));
try {
BufferedImage image = createElementImage(driver, element, 687, 362,
54, 18);//得到裁剪的圖片
ImageIO.write(image, "png", imageFile);//進行保存
} catch (IOException e) {
e.printStackTrace();
}
ITesseract instance = new Tesseract();//調用Tesseract
URL url = ClassLoader.getSystemResource("tessdata");//獲得Tesseract的文字庫
String tesspath = url.getPath().substring(1);
instance.setDatapath(tesspath);//進行讀取，默認是英文，如果要使用中文包，加上instance.setLanguage("chi_sim");
String result = null;
try {
result = instance.doOCR(imageFile);
} catch (TesseractException e1) {

e1.printStackTrace();
}
result = result.replaceAll("[^a-z^A-Z^0-9]", "");//替換大小寫及數字
return result;
}

執行結果，得到的圖片

得到的驗證碼

總結：tess4j安裝比較方便，只要引入jar就行，不需要額外安裝其他軟件，tess4j下也封裝了圖片處理的工具類：如縮放，旋轉等（這些我還沒用到）。

另外在讀取圖片的時候，還是比較容易出錯的比如t和l，i和l，e和o容易出現錯讀的情況，希望有大佬可以完善我的方法，提高正確率。

今天的文章分享就到這里了，感謝小核桃童鞋的分享，其他童鞋有什么想要交流的可以在留言區里面留言噢

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 自動化測試-短信驗證碼處理【面試】軟件測試面試題軟件測試常問面試題，你真的會搭建測試環境嗎？ Ruby+watir自動化測試中實現識別驗證碼圖片 Python 自動化之驗證碼識別疫情期間我是如何拿到20k的offer，2020年php面試題匯總軟件測試面試題軟件測試面試題軟件測試面試題（2）軟件測試面試題及答案