爬蟲筆記之teambition登錄驗證碼


一、緣起

想做的事情太多,計划亂糟糟,想找個工具理一下,想起來了的很久之前用過teambition,打算看一下,然后在登錄界面看到一個比較有意思的驗證碼:

image

這種倒是比較有意思哈,看着像是模仿12306的那種,12306的破不了(我真人都要刷幾次才能對。。。),這個簡單版的還破不了嗎,於是激發了我強烈的破解興趣。


二、分析

打開開發者工具,先選中看一下先:

image

首先比較雷的是“地球”竟然是文本顯示在頁面上的,這就比較尷尬了,不過其實這個無所謂,即使是圖片也沒關系,這里的重點是要每次返回的都有所區分(區分度越大越安全,否則使用使用一些基於統計的方式很容易就能夠破掉),否則的話會被以比較低的成本作為一個標識,然后就是那幾張圖片的顯示,里面有個uid,然后還有個index,那么這兩個變量是從哪里來的呢,點擊刷新按鈕,然后觀察網絡請求會發現有幾個:

image

這個uid和value的數組下標一拼裝就是頁面上顯示的圖標的url,至此看起來沒啥毛病。


然后就是考慮如何破解的問題了,我看這幾個圖標畫的如此清新脫俗,應該是手工畫的,既然是手工畫的,那么其數量應該是有限的,最多幾百個吧,那么完全可以采用打標簽的方式來,但是打標簽的話幾百個也是太多了,而且只是手動打標簽識別這種平平無奇的做法,也不值一提了,這有一種無須手動打標簽的方式,就是上面的接口中,“地球”所對應的圖片一定在下面的values數組中,而我只需要對這個接口多請求幾次,然后對它們按照imageName分組,比如“地球”這個分組會對應着很多個values,每個values中都有一張圖片是真的“地球”,哪張是呢,所有的values的交集就是,這樣進行一個group by imageName --> mapGroup求分組內values交集 --> 得到一個imageName對應的圖片的特征,這個就作為模型,識別的時候只需要根據imageName取出模型中對應的圖片特征,然后破解時從新請求返回的values找到哪張圖片的特征是能夠對應上的,就實現了從imageName到圖片的識別。


三、編碼實現

首先請求獲取驗證碼的接口,得到一批圖片:

package cc11001100.misc.crawler.captcha.teambition;

import cc11001100.misc.crawler.utils.HttpUtil;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.jsoup.Connection;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

/**
 *
 * 下載一些原始的驗證碼圖片以用作分析
 *
 * https://account.teambition.com/login
 *
 * @author CC11001100
 */
public class TeambitionCaptchaCrawler {

	public static void handleSingleCaptcha(String captchaResponseJsonStr) throws IOException {
		JSONObject responseJson = JSON.parseObject(captchaResponseJsonStr);
		String uid = responseJson.getString("uid");
		JSONArray values = responseJson.getJSONArray("values");
		for (int i = 0; i < values.size(); i++) {
			String url = "https://auth_services.teambition.com/captcha/image?uid=" + uid + "&lang=zh&index=" + i;
			byte[] captchaBytes = HttpUtil.request(url, null, Connection.Response::bodyAsBytes);
			String outputLocation = "data/captcha/teambition/captcha-imgs/" + values.getString(i) + ".png";
			IOUtils.write(captchaBytes, new FileOutputStream(outputLocation));
		}
	}

	public static void downloadRawData() throws IOException {
		for (int i = 0; i < 10000; i++) {
			String url = "https://auth_services.teambition.com/captcha/setup?num=5&lang=zh&_=" + System.currentTimeMillis();
			String responseBody = HttpUtil.request(url, connection -> {
				connection.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36");
			}, Connection.Response::body);
			FileUtils.writeStringToFile(new File("data/captcha/teambition/raw-captcha.jsonl"), responseBody + "\n", "UTF-8", true);
			handleSingleCaptcha(responseBody);
		}
	}

	public static void main(String[] args) throws IOException {

		downloadRawData();

	}

}

得到一批驗證碼的原始圖片:

image

raw-captcha.json:

{"values":["f38d8ea6e916762a57be2108c7c0b29c027650f3","cc6fc6dd8ef8f17b7fa99f44dfaa65df221af4f4","801d66a2673d0ba30ba9d02412d2321e1ba1de94","66cecc1d439a74d10a6a2d628f2a358fa90df1df","5d8029335a0330a1ff9f0c5766f3502dbba1ad1f"],"imageName":"飛機","uid":"27419090-0b57-11ea-8fd8-e31d1dab49d9"}
{"values":["f54416fb674ae827c35c004e35057bdbc14fc0fe","ca1cc93bdc4236889e154b7542bf5675c7ffedc0","368a7a84132450e43f5020802b397b071dfe7840","0a449565c02a16adbe17b799c30947c8c904ad73","29d6a3b0ff1c42af3a9bb47fd759872a5e8f5931"],"imageName":"鎖","uid":"27852940-0b57-11ea-a598-c15471c1be2e"}
{"values":["84ba343b153e45c4c9aae9b260cfefa297587eda","fffe2e988c105b50c902d6372340a306abda0ce5","04ab1f94a0a73fdbf37d6aa40beb9878ef737c8f","ec2b8e8612ced4cd43f656bce3050dc3ef58c656","e61b104c2e7fbc1c6ab1dfa1b2a23f2c6fde1680"],"imageName":"相機","uid":"27bab830-0b57-11ea-8fd8-e31d1dab49d9"}
{"values":["295da38531279e1938aa4a87f354f27f62feb159","b3ed0b06f53373ba6d8ffdccea65e807d26b53e6","f6baa4dcea4f4d845f4d13301a39dbbe9bbe9fe9","61ce83c8c500d9cf96b79e34e9147993a0c6b359","93244a67523c11554f3ce8b49950d8fcbdbbf8fd"],"imageName":"鎖","uid":"27ecebc0-0b57-11ea-8fd8-e31d1dab49d9"}
{"values":["b2c5fa8ec472914cbdaff3d790fc0eb0c8a45adf","5ebd5d74430628a34fb189e9efc919c12afa069e","369adc4406bd49e7876e760b3e13dcc2637daa87","d664b3e2334880096f88fd50349e7d5b9e4e0fcd","cef15533490a94c27eeb8ae8dd97efb47ba717ce"],"imageName":"相機","uid":"282587f0-0b57-11ea-8fd8-e31d1dab49d9"}
...

然后就是剛才從剛才下載到的驗證碼圖片中生成imageName到圖片特征的一個map:

package cc11001100.misc.crawler.captcha.teambition;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.StringUtils;

import javax.imageio.ImageIO;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

/**
 * @author CC11001100
 */
@Slf4j
public class DerivationLabel {

    // 將用到的圖片集找出來並打個標注
    private static void derivationLabel() throws IOException {
        Map<String, Integer> imageNameToHashCodeMap = new HashMap<>();
        FileUtils.readLines(new File("data/captcha/teambition/raw-captcha.jsonl"), "UTF-8").stream()
                .filter(StringUtils::isNotBlank)
                .collect(Collectors.groupingBy(line -> JSON.parseObject(line).getString("imageName")))
                .forEach((imageName, lineList) -> {
                    Set<Integer> interceptingSet = new HashSet<>();
                    for (String line : lineList) {
                        JSONArray values = JSON.parseObject(line).getJSONArray("values");
                        Set<Integer> currentSet = new HashSet<>();
                        // 下載的時候有幾次強制中斷觀察效果,所以一組values的圖片可能會下得不全,不全的這種就直接忽略掉了
                        boolean hasError = false;
                        for (int i = 0; i < values.size(); i++) {
                            String f = "data/captcha/teambition/captcha-imgs/" + values.getString(i) + ".png";
                            try {
                                currentSet.add(ImageUtil.hash(ImageIO.read(new FileInputStream(f))));
                            } catch (Exception e) {
                                log.error("Exception, path=" + f, e);
                                hasError = true;
                                break;
                            }
                        }
                        if (hasError) {
                            return;
                        }
                        if (interceptingSet.isEmpty()) {
                            interceptingSet.addAll(currentSet);
                        } else {
                            interceptingSet.retainAll(currentSet);
                            if (interceptingSet.isEmpty()) {
                                log.info("數據不足,imageName={}", imageName);
                                break;
                            }
                        }
                    }
                    if (interceptingSet.size() != 1) {
                        log.info("imageName={}, derivation failed", imageName);
                    } else {
                        log.info("imageName={}, set={}", imageName, interceptingSet);
                        imageNameToHashCodeMap.put(imageName, interceptingSet.iterator().next());
                    }
                });
        imageNameToHashCodeMap.forEach((k, v) -> System.out.printf("map.put(\"%s\", %d);\n", k, v));
    }

    public static void main(String[] args) throws IOException {
        derivationLabel();
    }

}

這里對圖片的特征就是取hash值,用到的工具類如下:

package cc11001100.misc.crawler.captcha.teambition;

import lombok.extern.slf4j.Slf4j;

import java.awt.image.BufferedImage;

/**
 * @author CC11001100
 */
@Slf4j
public class ImageUtil {

    public static int hash(BufferedImage image) {
        StringBuilder msg = new StringBuilder();
        for (int i = 0; i < image.getWidth(); i++) {
            for (int j = 0; j < image.getHeight(); j++) {
                msg.append(image.getRGB(i, j)).append("|");
            }
        }
        return msg.toString().hashCode();
    }

}

生成的map如下:

map.put("音符", 182834422);
map.put("鎖", 825168351);
map.put("機器人", -1714422141);
map.put("汽車", -769011042);
map.put("鑰匙", 975258806);
map.put("樹葉", -179264444);
map.put("信封", -702966573);
map.put("相機", 663652535);
map.put("文件夾", -1425863546);
map.put("雲朵", 2106631124);
map.put("飛機", 1640044711);
map.put("T恤", -258338857);
map.put("眼睛", 1647675580);
map.put("樹", -2063289315);
map.put("放大鏡", -1715725768);
map.put("鬧鍾", 1335715652);
map.put("回形針", 1654053339);
map.put("地球", -1592219546);
map.put("腳印", -1438760947);
map.put("標簽", 761482882);
map.put("剪刀", 1998833602);
map.put("燈泡", 418507311);
map.put("傘", -2104015908);
map.put("圖表", -824773152);
map.put("氣球", 1423728112);
map.put("太陽眼鏡", 1204904862);
map.put("椅子", 193112560);
map.put("打印機", -939522792);
map.put("旗幟", 834329993);
map.put("貓", 1911236121);
map.put("女人", 2047088238);
map.put("男人", 664214693);
map.put("卡車", -1453025175);
map.put("電腦", -1970735883);
map.put("褲子", -337658120);
map.put("鉛筆", 1993614559);
map.put("房子", -1299209990);

然后就是識別部分了,這里只是將答案打印出來,並不提交,提交的話短信就真的發出去了:

package cc11001100.misc.crawler.captcha.teambition;

import cc11001100.misc.crawler.utils.HttpUtil;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import lombok.Builder;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.jsoup.Connection;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

/**
 * @author CC11001100
 */
@Slf4j
public class TeambitionCaptchaCracker {

    private static final Map<String, Integer> map = new HashMap<>();

    static {
        map.put("音符", 182834422);
        map.put("鎖", 825168351);
        map.put("機器人", -1714422141);
        map.put("汽車", -769011042);
        map.put("鑰匙", 975258806);
        map.put("樹葉", -179264444);
        map.put("信封", -702966573);
        map.put("相機", 663652535);
        map.put("文件夾", -1425863546);
        map.put("雲朵", 2106631124);
        map.put("飛機", 1640044711);
        map.put("T恤", -258338857);
        map.put("眼睛", 1647675580);
        map.put("樹", -2063289315);
        map.put("放大鏡", -1715725768);
        map.put("鬧鍾", 1335715652);
        map.put("回形針", 1654053339);
        map.put("地球", -1592219546);
        map.put("腳印", -1438760947);
        map.put("標簽", 761482882);
        map.put("剪刀", 1998833602);
        map.put("燈泡", 418507311);
        map.put("傘", -2104015908);
        map.put("圖表", -824773152);
        map.put("氣球", 1423728112);
        map.put("太陽眼鏡", 1204904862);
        map.put("椅子", 193112560);
        map.put("打印機", -939522792);
        map.put("旗幟", 834329993);
        map.put("貓", 1911236121);
        map.put("女人", 2047088238);
        map.put("男人", 664214693);
        map.put("卡車", -1453025175);
        map.put("電腦", -1970735883);
        map.put("褲子", -337658120);
        map.put("鉛筆", 1993614559);
        map.put("房子", -1299209990);
    }

    public static Answer getAnswer(JSONObject responseJsonObject) throws IOException {
        String imageName = responseJsonObject.getString("imageName");
        Integer targetHashcode = map.get(imageName);
        if (targetHashcode == null) {
            return null;
        }
        JSONArray values = responseJsonObject.getJSONArray("values");
        String uid = responseJsonObject.getString("uid");
        for (int i = 0; i < values.size(); i++) {
            String url = "https://auth_services.teambition.com/captcha/image?uid=" + uid + "&lang=zh&index=" + i;
            byte[] imageBytes = HttpUtil.request(url, null, Connection.Response::bodyAsBytes);
            if (imageBytes == null) {
                log.info("image download failed, imageName={}, uid={}, index={}", imageName, uid, i);
                continue;
            }
            BufferedImage image = ImageIO.read(new ByteArrayInputStream(imageBytes));
            int currentImageHashcode = ImageUtil.hash(image);
            if (currentImageHashcode == targetHashcode) {
                return Answer.builder().imageName(imageName).imageUrl(url).index(i).build();
            }
        }
        return null;
    }

    public static void test() throws IOException {
        String url = "https://auth_services.teambition.com/captcha/setup?num=5&lang=zh&_=" + System.currentTimeMillis();
        String responseJsonStr = HttpUtil.request(url, connection -> {
            connection.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36");
        }, Connection.Response::body);
        JSONObject responseJsonObject = JSON.parseObject(responseJsonStr);
        Answer answer = getAnswer(responseJsonObject);
        if (answer == null) {
            log.info("not find answer, responseJsonStr={}", responseJsonStr);
        } else {
            System.out.println(JSON.toJSONString(answer, true));
        }
    }

    @Data
    @Builder
    public static class Answer {
        private String imageName;
        private String imageUrl;
        private int index;
    }

    public static void main(String[] args) throws IOException {
        test();
    }

}

輸出如下:

{
	"imageName":"氣球",
	"imageUrl":"https://auth_services.teambition.com/captcha/image?uid=814b9d80-0b5a-11ea-8fd8-e31d1dab49d9&lang=zh&index=3",
	"index":3
}

點一下查看圖片(點自己控制台上的,驗證碼圖片都是會過期的,這里的鏈接過不多久就不能用了),發現是氣球,多試個幾次也都是對的,至此破解完畢。


相關資料:

1. https://account.teambition.com/login


.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM