爬蟲筆記之w3cschool注冊頁面滑塊驗證碼破解(巨簡單滑塊位置識別,非鼠標模擬軌跡)


 

一、背景介紹

最開始接觸驗證碼破解的時候就是破解的w3cschool的使用手機號找回密碼頁面的驗證碼,詳見:驗證碼識別之w3cschool字符圖片驗證碼(easy級別),這次破解一下他們注冊頁面的滑塊驗證碼,有點忐忑,我這么跟人過不去不會被打吧...

閱讀前請知悉:本篇文章只涉及到滑塊驗證碼的滑塊位置識別,主要知識集中在圖像處理方面,並不涉及到模擬鼠標軌跡等知識。

 

二、分析

首先打開這個頁面:https://www.w3cschool.cn/register,觀察下這個滑塊驗證碼長啥樣:

image

一般來說這種滑塊驗證碼都是我每次拖動松開鼠標的時候向服務器發送一個請求驗證此次拖動是否成功,打開F12,拖動失敗一次,拖動成功一次,觀察一下網絡請求及返回值:

驗證失敗:

Request:

image

Response:

image

驗證成功:

Request:

image

Response:

image 

發送請求的時候需要攜帶一個point參數,從第二次的請求中大致可以推測出這個值要和返回值中的data一致才可以成功,經過試驗這個值就是小黑塊距離左邊界的距離:

image

因此只需要從背景圖中識別出小黑塊最左邊一列在整張圖片中的x值即可。

再來看一下怎么把這個背景圖下載下來,按ctrl+shift+c選一下這張圖片:

image

看下這個background-image長啥樣:

用過css雪碧圖的應該一看就明白了,這是很多小長條形的圖片拼接成的一張大圖,然后頁面上每個gt_cut_fullbg_slice樣式的div對應其中的一小塊,塊大小是13px*58px,是一個長條形,每個div使用css背景偏移定位到自己對應的那個小長條塊,background-position的值就是這個小長條圖片的左上角在大圖上的位置坐標(x,y),所以接下來要做的就是從網頁中解析出來這張大圖的url,然后下載下來根據css偏移量重新組裝。

在html中搜索一下.gt_cut_fullbg_slice可以找到圖片的位置:

image

用正則解析出來下載到本地,然后再解析DOM,提取出所有帶gt_cut_fullbg_slice樣式的div的background-position屬性,從大圖上對應位置摳出來13px*58px大小的像素依次寫入到一張新的圖片,組裝成新圖片的效果:

1545067573482 

接下來就是比較頭疼的事情了,怎么能夠識別出小黑塊的位置,並且正確率能夠比較高,這個我開始試了幾種方案,比如按照亮度、飽和度、色彩,但均會有一定程度的誤判,效果並不理想,后來發現背景圖雖然不斷的變化,但是好像來來回回就那幾張,只是小黑塊的位置不同而已,多下載幾張圖片觀察一下規律:

image

所以我只需要想辦法將它們的背景圖還原(沒有小黑塊),然后將有小黑塊的圖和沒有小黑塊的圖做一個diff就能夠准確的識別到小黑塊的位置,理論上准確率能夠達到100%。

這個猜想的基礎是這些看上去一樣的的圖片實際上確實是一樣的,因為有些圖片雖然肉眼看上去是一樣的,但是亮度、飽和度方面還是有些差別的,所以要支撐我的猜想我需要先做一個實驗,就是找兩張看上去一樣的圖片diff一下,看看它們有區別的像素的分布情況,diff結果如下:

image

第三張圖是第一張圖和第二張圖diff的結果,其中白色部分是它們相同的部分,黑色是不相同的部分,可以看到確實只有小黑塊部分不一樣,說明它們本來應該是一張圖,只是隨機加了小黑塊。

接下來就比較簡單了,先想辦法從這些有小黑塊的圖中還原出不帶小黑塊的原圖來,這里的思路就是先下載1000張圖片:

image

然后給每張圖片取固定位置的幾個像素做為特征,比如這里取了不太容易被小黑塊覆蓋住的四個頂點位置的像素的十六進制拼接作為特征,然后將這1000張圖按照特征進行分組:

image

果然如我所料,背景圖來來回回只有四張,每個分組對應着一個文件夾,每個文件夾下圖片的背景都是一樣的,區別只是小黑塊的位置不同而已:

image

看上面的圖片,假設背景圖的(100,100)位置在第一張圖是被小黑塊覆蓋的,但是第二張圖並不一定是,所有圖的(100,100)位置都被小黑塊覆蓋掉的幾率太小了,所以只需要取出所有圖在(100,100)位置的像素,然后select rgb from all_image_100_100_rgb_value group by value order by count(1) desc limit 1即可,同理,對圖片上的每個像素點都如此處理,能夠得到原圖。

對每個分組如此處理,得到每個分組對應的原圖:

image

哈,看上去比較神奇,但原理確實比較簡單,接下來就是使用帶小黑塊的圖片:

1545070677068

對其提取特征(四個頂點rgb值的十六進制拼接),然后找到這個特征對應的原圖,即不帶小黑塊的圖片:

ff6161ff4b4bfc5252fc5858

它倆之間做一個diff,從左到右按列掃描,取第一個rgb值不相等的像素所在列的下標作為偏移,這個值就是point。

 

三、代碼實現

對上面分析的代碼實現:

package cc11001100.crawler.w3cschool;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import org.apache.commons.io.FilenameUtils;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;

import static java.lang.Integer.parseInt;
import static java.util.Collections.emptyMap;
import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toList;
import static jodd.io.FileNameUtil.concat;

/**
 * w3c school簡單滑塊驗證碼破解
 *
 * <a>https://www.w3cschool.cn/register</a>
 *
 * @author CC11001100
 */
public class W3cSchoolRegisterCaptcha {

	private static final Logger log = LoggerFactory.getLogger(W3cSchoolRegisterCaptcha.class);

	private static final Pattern extractBgImgUrlPattern = Pattern.compile("\\.gt_cut_fullbg_slice[\\s\\S]+?background-image: url\\(\"(.+)\"\\)");
	private static final Pattern cssStyleBackgroundPositionPattern = Pattern.compile("background-position:-?(\\d+)px -?(\\d+)px;");

	private Map<String, BufferedImage> fingerprintToPerfectBackgroundImageMap = new HashMap<>();

	// 加載指紋和對應完整圖片的映射關系
	public boolean load(String perfectImageDir) {
		File[] perfectFiles = new File(perfectImageDir).listFiles();
		if (perfectFiles == null) {
			log.error("load fingerprint mapping failed, dir {} empty", perfectImageDir);
			return false;
		}
		for (File file : perfectFiles) {
			try {
				BufferedImage img = ImageIO.read(file);
				String fingerprint = extractBackgroundImageFingerprint(img);
				fingerprintToPerfectBackgroundImageMap.put(fingerprint, img);
			} catch (IOException e) {
				log.error("IOException", e);
			}
		}
		return false;
	}

	// 為HttpUtil保存的當前Session設置滑塊驗證ok的標志
	public boolean touch() {
		String htmlContent = HttpUtil.downloadText("https://www.w3cschool.cn/register", emptyMap(), null, null);
		Document document = Jsoup.parse(htmlContent);
		String imgUrl = extractBgImgUrl(htmlContent);
		if (imgUrl == null) {
			throw new RuntimeException("cant find background image url");
		}
		imgUrl = "https://www.w3cschool.cn/" + imgUrl;
		BufferedImage garbledImage = HttpUtil.downloadImage(imgUrl);
		BufferedImage normalImage = splice(document, garbledImage);
		String fingerprint = extractBackgroundImageFingerprint(normalImage);
		BufferedImage perfectImage = fingerprintToPerfectBackgroundImageMap.get(fingerprint);
		int firstDiffColumnIndex = firstDiffColumnIndex(normalImage, perfectImage);
		return tellServerIamOk(firstDiffColumnIndex);
	}

	private boolean tellServerIamOk(int offset) {
		Map<String, String> params = new HashMap<>();
		params.put("point", Integer.toString(offset));
		String responseContent = HttpUtil.downloadText("https://www.w3cschool.cn/dragcheck", emptyMap(),
				connection -> connection.method(Connection.Method.POST).data(params), null);
		if (responseContent == null) {
			throw new RuntimeException("drag check response null");
		}
		JSONObject o = JSON.parseObject(responseContent);
		if (o.getIntValue("statusCode") != 200) {
			throw new RuntimeException("drag check failed, response=" + responseContent);
		}
		int data = o.getIntValue("data");
		// 用於對比矯正掃描效果
		log.info("offset={}, data={}", offset, data);
		return offset == data;
	}

	// 從html頁面中抽取驗證碼圖片的url
	private String extractBgImgUrl(String htmlContent) {
		Matcher matcher = extractBgImgUrlPattern.matcher(htmlContent);
		if (matcher.find()) {
			return matcher.group(1);
		}
		return null;
	}

	// 將精神錯亂的背景圖重新組裝正常
	private BufferedImage splice(Document document, BufferedImage img) {
		Elements blockElts = document.select(".gt_cut_fullbg_slice");
		if (blockElts.isEmpty()) {
			throw new RuntimeException("cannot find captcha elements, ensure in register page.");
		}
		// 驗證碼背景圖大小260px*116px
		SpliceImage spliceImage = new SpliceImage(260, 116);
		blockElts.forEach(elt -> {
			String style = elt.attr("style");
			Matcher matcher = cssStyleBackgroundPositionPattern.matcher(style);
			if (matcher.find()) {
				int x = parseInt(matcher.group(1));
				int y = parseInt(matcher.group(2));
				// 組成背景圖的每個塊的大小是13px*58px
				BufferedImage block = img.getSubimage(x, y, 13, 58);
				spliceImage.append(block);
			} else {
				log.info("style:{}, cannot extract background-position", style);
			}
		});
		return spliceImage.getBufferedImage();
	}

	// 從左到右掃描,返回第一個不同列的偏移
	public int firstDiffColumnIndex(BufferedImage src, BufferedImage dest) {
		int w = src.getWidth();
		int h = src.getHeight();
		for (int x = 0; x < w; x++) {
			for (int y = 0; y < h; y++) {
				if (src.getRGB(x, y) != dest.getRGB(x, y)) {
					return x;
				}
			}
		}
		return -1;
	}

	public class SpliceImage {
		private BufferedImage bufferedImage;
		private int nextX;
		private int nextY;

		public SpliceImage(int width, int height) {
			bufferedImage = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
		}

		public void append(BufferedImage img) {
			bufferedImage.getGraphics().drawImage(img, nextX, nextY, null);
			nextX += img.getWidth();
			// new line
			if (nextX >= bufferedImage.getWidth()) {
				nextX = 0;
				nextY += img.getHeight();
			}
		}

		public BufferedImage getBufferedImage() {
			return bufferedImage;
		}

	}

	/*----------------------------------------- 以下為預處理部分的代碼 ----------------------------------------------------*/

	// 下載一些背景圖到本地
	public void downloadBackgroundImage(String saveBaseDir, int num) {
		log.info("download prepare");
		ExecutorService executorService = Executors.newFixedThreadPool(3);
		for (int i = 0; i < num; i++) {
			executorService.execute(() -> {
				long threadId = Thread.currentThread().getId();
				log.info("prepare " + threadId);
				String htmlContent = HttpUtil.downloadText("https://www.w3cschool.cn/register", emptyMap(), null, null);
				Document document = Jsoup.parse(htmlContent);
				String imgUrl = extractBgImgUrl(htmlContent);
				if (imgUrl == null) {
					throw new RuntimeException("cant find bg img url");
				}
				imgUrl = "https://www.w3cschool.cn/" + imgUrl;
				BufferedImage bgImg = HttpUtil.downloadImage(imgUrl);
				BufferedImage perfectImg = splice(document, bgImg);
				String filePath = FilenameUtils.concat(saveBaseDir, System.currentTimeMillis() + ".png");
				try {
					ImageIO.write(perfectImg, "png", new File(filePath));
				} catch (IOException e) {
					log.error("download background image failed", e);
				}
				log.info("end " + threadId);
			});
		}
		executorService.shutdown();
		try {
			executorService.awaitTermination(10, TimeUnit.DAYS);
		} catch (InterruptedException e) {
			log.error("InterruptedException", e);
		}
		log.info("download done.");
	}

	// 用於對比兩張背景圖的差異性,用於驗證復原背景圖再對比的方案是否可行
	public void diff(String srcImagePath, String destImagePath, String resultSavePath) throws IOException {
		BufferedImage srcImage = ImageIO.read(new FileInputStream(srcImagePath));
		BufferedImage destImage = ImageIO.read(new FileInputStream(destImagePath));
		int w = srcImage.getWidth();
		int h = srcImage.getHeight();

		BufferedImage resultImage = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB);
		for (int x = 0; x < w; x++) {
			for (int y = 0; y < h; y++) {
				// 相同部分置為白色,不同部分置為黑色
				if (srcImage.getRGB(x, y) == destImage.getRGB(x, y)) {
					resultImage.setRGB(x, y, 0X00FFFFFF);
				} else {
					resultImage.setRGB(x, y, 0X00000000);
				}
			}
		}
		ImageIO.write(resultImage, "png", new File(resultSavePath));
	}

	// 對圖片按照特征分組
	public void groupBy(String srcImageDir, String groupByResultDir) {
		File[] imageFiles = new File(srcImageDir).listFiles();
		if (imageFiles == null) {
			log.error("no image file in " + srcImageDir);
			return;
		}
		Stream.of(imageFiles).map(f -> {
			try {
				return ImageIO.read(f);
			} catch (IOException e) {
				log.error("IOException", e);
			}
			return null;
		}).filter(Objects::nonNull)
				// 對圖片按照特征分組
				.collect(groupingBy(this::extractBackgroundImageFingerprint))
				.forEach((key, value) -> {
					String basePath = concat(groupByResultDir, key);
					File basePathDirFile = new File(basePath);
					if (basePathDirFile.exists()) {
						basePathDirFile.delete();
					}
					basePathDirFile.mkdirs();
					value.forEach(img -> {
						String imgPath = concat(basePath, System.currentTimeMillis() + ".png");
						try {
							ImageIO.write(img, "png", new File(imgPath));
						} catch (IOException e) {
							log.error("IOException", e);
						}
					});
				});
	}

	// 抽取背景圖特征,以幾個點的顏色作為特征
	private String extractBackgroundImageFingerprint(BufferedImage img) {
		int w = img.getWidth();
		int h = img.getHeight();
		// 暫時用四個角的像素作為特征看看效果怎么樣
		int[][] points = {
				{0, 0},
				{w - 1, 0},
				{0, h - 1},
				{w - 1, h - 1}
		};
		StringBuilder sb = new StringBuilder();
		for (int[] point : points) {
			sb.append(Integer.toString(img.getRGB(point[0], point[1]) & 0X00FFFFFF, 16));
		}
		return sb.toString();
	}

	// 將每個分區下的圖像融合為一個無缺塊的原圖
	public void merge(String groupByDir, String mergeResultSaveDir) throws IOException {
		File[] groups = new File(groupByDir).listFiles();
		if (groups == null) {
			log.error(groupByDir + " empty");
			return;
		}
		for (File group : groups) {
			mergeSingleGroup(group, mergeResultSaveDir);
		}
	}

	private void mergeSingleGroup(File group, String mergeResultSaveDir) {
		String[] imgs = group.list();
		if (imgs == null) {
			log.warn("group {} empty", group.getName());
			return;
		}
		List<BufferedImage> imgList = Stream.of(imgs).limit(100).map(imgPath -> {
			try {
				return ImageIO.read(new File(concat(group.getPath(), imgPath)));
			} catch (IOException e) {
				log.error("IOException", e);
			}
			return null;
		}).filter(Objects::nonNull)
				.collect(toList());
		int w = imgList.get(0).getWidth();
		int h = imgList.get(0).getHeight();
		ImageMerge imageMerge = new ImageMerge(w, h);
		for (int x = 0; x < w; x++) {
			for (int y = 0; y < h; y++) {
				imageMerge.prepare(x, y);
				for (BufferedImage img : imgList) {
					imageMerge.vote(img.getRGB(x, y));
				}
				imageMerge.declareTheResult();
			}
		}
		File mergeResultSaveDirFile = new File(mergeResultSaveDir);
		if (mergeResultSaveDirFile.exists()) {
			mergeResultSaveDirFile.delete();
		}
		mergeResultSaveDirFile.mkdirs();

		String path = concat(mergeResultSaveDir, group.getName() + ".png");
		try {
			ImageIO.write(imageMerge.getBufferedImage(), "png", new File(path));
		} catch (IOException e) {
			log.error("IOException", e);
		}
	}

	// 使用同一個分組下的多張有缺塊的殘圖合並為一張無缺塊的原圖
	// 像素級別的選舉:每張殘缺圖將自己(x,y)點的rgb值交給此類作為選票,出現次數最多的選票獲勝作為最終結果
	public class ImageMerge {
		private BufferedImage bufferedImage;
		private int x;
		private int y;
		private Map<Integer, Integer> voteCountMap = new HashMap<>();

		public ImageMerge(int width, int height) {
			this.bufferedImage = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
		}

		public void prepare(int x, int y) {
			this.x = x;
			this.y = y;
			this.voteCountMap.clear();
		}

		public void vote(int voteValue) {
			int count = voteCountMap.getOrDefault(voteValue, 0);
			voteCountMap.put(voteValue, count + 1);
		}

		public void declareTheResult() {
			int theWinnerVoteValue = voteCountMap.entrySet().stream().max(comparingInt(Map.Entry::getValue)).get().getKey();
			bufferedImage.setRGB(x, y, theWinnerVoteValue);
		}

		public BufferedImage getBufferedImage() {
			return this.bufferedImage;
		}

	}

	// 為了避免給對方產生過多無用賬號(產品經理會大喜,用戶數猛增哈哈),這里不使用注冊接口了
	private void test() {
		int totalTimes = 100;
		int successTimes = 0;
		for (int i = 0; i < totalTimes; i++) {
			if (touch()) {
				successTimes++;
			}
			HttpUtil.clearCookie();
		}
		System.out.println("success rate " + (100.0 * successTimes / totalTimes) + "%");
	}

	public static void main(String[] args) throws IOException {

		W3cSchoolRegisterCaptcha captchaBackgroundImage = new W3cSchoolRegisterCaptcha();

		// 先下載一些背景圖到本地觀察一下它們的規律
//		captchaBackgroundImage.downloadBackgroundImage("data/w3c/raw", 1000);

		// 用於驗證是否除了滑塊部分其它部分都一樣
//		captchaBackgroundImage.diff("data/w3c/diff/1545119857099.png", "data/w3c/diff/1545119857138.png", "data/w3c/diff/diff-result.png");

		// 對下載下來的圖片進行分組,相同圖片放到一組中
//		captchaBackgroundImage.groupBy("data/w3c/raw", "data/w3c/groupBy");

		// 每個組下都有一些帶缺塊的圖片,使用這些帶缺塊的圖片合成出不帶缺塊的原圖來
//		captchaBackgroundImage.merge("data/w3c/groupBy", "data/w3c/merge");

		// 測試一下效果怎么樣
		captchaBackgroundImage.load("data/w3c/merge");
		captchaBackgroundImage.test();

	}

}

用到的HttpUtil工具類:

package cc11001100.crawler.w3cschool;


import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

/**
 * @author CC11001100
 */
public class HttpUtil {

	private static final Logger log = LoggerFactory.getLogger(HttpUtil.class);

	// 用來持久化cookie以保存會話
	private static Map<String, String> cookieMap = new HashMap<>();

	public static byte[] downloadBytes(String url, Map<String, String> params, ConnectionSetting connectionSetting, ResponseCheck responseCheck) {
		for (int i = 1; i <= 5; i++) {
			long start = System.currentTimeMillis();
			try {
				Connection connection = Jsoup.connect(url)
						.ignoreContentType(true)
						.ignoreHttpErrors(true)
						.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")
						.data(params)
						.cookies(cookieMap);
				if (connectionSetting != null) {
					connectionSetting.setting(connection);
				}
				Connection.Response response = connection.execute();
				byte[] responseBody = response.bodyAsBytes();
				if (responseCheck != null && !responseCheck.check(response, responseBody)) {
					throw new RuntimeException();
				}
				cookieMap.putAll(response.cookies());
				long cost = System.currentTimeMillis() - start;
				log.info("request ok, tryTimes={}, url={}, cost={}", i, url, cost);
				return responseBody;
			} catch (IOException e) {
				long cost = System.currentTimeMillis() - start;
				log.info("request failed, tryTimes={}, url={}, cost={}", i, url, cost);
			}
		}
		return null;
	}

	public static String downloadText(String url, Map<String, String> params, ConnectionSetting connectionSetting, ResponseCheck responseCheck) {
		byte[] responseContent = downloadBytes(url, params, connectionSetting, responseCheck);
		if (responseContent == null) {
			return null;
		}
		return new String(responseContent);
	}

	public static BufferedImage downloadImage(String url) {
		byte[] imgBytes = downloadBytes(url, Collections.emptyMap(), null, null);
		if (imgBytes == null) {
			return null;
		}
		try {
			return ImageIO.read(new ByteArrayInputStream(imgBytes));
		} catch (IOException e) {
			log.error("download image error, img url=" + url, e);
		}
		return null;
	}

	public static void clearCookie(){
		cookieMap.clear();
	}

	@FunctionalInterface
	public static interface ResponseCheck {
		boolean check(Connection.Response response, byte[] responseBody);
	}

	@FunctionalInterface
	public static interface ConnectionSetting {
		void setting(Connection connection);
	}

}

運行一下,測試一下識別的效果:

image

果然如我所料,識別率能夠達到100%。

 

.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM