在做爬蟲項目的時候,有時候會遇到驗證碼的問題,由於某些網站的驗證碼是動態生成的,即使是同一個鏈接,在不同的時間訪問可能產生不同的驗證碼,
一 剛開始的思路就是打開這個驗證碼的鏈接,然后通過java代碼get請求保存驗證碼圖片到本地,然后用打碼工具解析驗證碼,將驗證碼自動輸入驗證框就
可以把驗證碼的問題解決了,但是問題來,每次的請求同一個地址,產生的驗證碼圖片是不一樣的,所以這種方法行不通。所以只能將圖片先用selenium WebDriver
截取到本地,然后用打碼工具解析ok ,自動填寫驗證,很好把驗證碼的問題解決了。
package com.entrym.main;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Set;
import javax.imageio.ImageIO;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.StringUtils;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.openqa.selenium.By;
import org.openqa.selenium.Cookie;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.Point;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;
import com.entrym.crawler.util.verifyCode.Captcha;
import com.entrym.crawler.util.verifyCode.DamaUtil;
import com.entrym.domain.SogouInfo;
import com.entrym.domain.Wxinfo;
import com.entrym.util.ConfigUtil;
import com.entrym.util.DateUtil;
import com.entrym.util.HttpUtils;
import com.google.gson.Gson;
import com.vdurmont.emoji.EmojiParser;
public class WebTest {
private static final String GET_TITLE="/titles/getxiaoshuo";
private static final String PATH=new File("config/config.properties").getAbsolutePath();
private static final String CHROME_HOME=new File("config/chromedriver.exe").getAbsolutePath();
private static final String CHROME_HOME_LINUX=new File("config/chromedriver").getAbsolutePath();
private static final String BASEURL=ConfigUtil.reads(PATH, "baseurl");
public static void main(String[] args) throws IOException {
WebDriver driver=null;
// System.setProperty("webdriver.gecko.driver", FIREFOX_HOME);
System.out.println(PATH);
String osname=System.getProperty("os.name").toLowerCase();
if(osname.indexOf("linux")>=0){
System.setProperty("webdriver.chrome.driver", CHROME_HOME_LINUX);
// driver = new MarionetteDriver();
}else{
System.setProperty("webdriver.chrome.driver", CHROME_HOME);
// driver = new MarionetteDriver();
}
driver=new ChromeDriver();
driver.get("http://weixin.sogou.com/antispider/?from=%2fweixin%3Ftype%3d2%26query%3dz+%26ie%3dutf8%26s_from%3dinput%26_sug_%3dy%26_sug_type_%3d");
WebElement ele = driver.findElement(By.id("seccodeImage"));
// Get entire page screenshot
File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
BufferedImage fullImg = ImageIO.read(screenshot);
// Get the location of element on the page
Point point = ele.getLocation();
// Get width and height of the element
int eleWidth = ele.getSize().getWidth();
int eleHeight = ele.getSize().getHeight();
// Crop the entire page screenshot to get only element screenshot
BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),
eleWidth, eleHeight);
ImageIO.write(eleScreenshot, "png", screenshot);
// Copy the element screenshot to disk
File screenshotLocation = new File("D:/captcha/test.png");
FileUtils.copyFile(screenshot, screenshotLocation);
WebElement classelement = driver.findElement(By.className("p2"));
String errorText=classelement.getText();
System.out.println("輸出的內容是"+classelement.getText());
if(errorText.indexOf("用戶您好,您的訪問過於頻繁,為確認本次訪問為正常用戶行為")>=0){
System.out.println("*********************");
DamaUtil util=new DamaUtil();
System.out.println("===================");
String code=""; //驗證碼
Captcha captcha=new Captcha();
captcha.setFilePath("test.png");
code = DamaUtil.getCaptchaResult(captcha);
System.out.println("打碼處理出來的驗證碼是"+code);
WebElement elementsumbit = driver.findElement(By.id("seccodeInput"));
// 輸入關鍵字
elementsumbit.sendKeys(code);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// 提交 input 所在的 form
elementsumbit.submit();
System.out.println("成功");
}
}
}
以上就代碼,關鍵的代碼在Stack Overflow得到的,不得不說谷歌還是很強大的
喜歡呼呼的文章的朋友,可以關注呼呼的個人公眾號:

driver.get("http://www.google.com");
WebElement ele = driver.findElement(By.id("hplogo"));
// Get entire page screenshot
File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
BufferedImage fullImg = ImageIO.read(screenshot);
// Get the location of element on the page
Point point = ele.getLocation();
// Get width and height of the element
int eleWidth = ele.getSize().getWidth();
int eleHeight = ele.getSize().getHeight();
// Crop the entire page screenshot to get only element screenshot
BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),
eleWidth, eleHeight);
ImageIO.write(eleScreenshot, "png", screenshot);
// Copy the element screenshot to disk
File screenshotLocation = new File("C:\\images\\GoogleLogo_screenshot.png");
FileUtils.copyFile(screenshot, screenshotLocation);
以上就是關鍵的截取代碼,在國外的鏈接是http://stackoverflow.com/questions/13832322/how-to-capture-the-screenshot-of-a-specific-element-rather-than-entire-page-usin
感興趣的小伙伴可以研究一下
