亂序拼圖驗證的識別並還原-puzzle-captcha

本文轉載自查看原文 2021-12-09 15:02 930 python/ captcha/ 拼圖驗證碼/ 外掛/ 亂序拼圖/ 小知識/ 技術研究/ puzzle-captcha/ opencv/ Python/ 頂象，

一、前言

亂序拼圖驗證是一種較少見的驗證碼防御，市面上更多的是拖動滑塊，被完美攻克的有不少，都在行為軌跡上下足了功夫，本文不討論軌跡模擬范疇，就只針對拼圖還原進行研究。

找一個市面比較普及的頂像亂序拼圖進行驗證，它號稱的防御能力4星，用戶體驗3星，通過研究發現，它的還原程度相當高，思路也很簡單，下面一步步的講解還原過程。

二、環境准備

1.依賴

采集模擬 selenium
特征匹配 python+opencv

2.安裝環境

!pip install setuptools
!pip install selenium
!pip install numpy Matplotlib
!pip install opencv-python

3.chormedriver 下載

找到對應瀏覽器版本+系統平台的driver后，macOS 建議存放到 /usr/local/bin

!wget https://npm.taobao.org/mirrors/chromedriver/95.0.4638.69/chromedriver_mac64.zip

三、采集樣本

引入依賴庫，使用 webdriver 打開官方網站的產品演示頁面

import os
import cv2
import time
import urllib.request
import matplotlib.pyplot as plt
import numpy as np

from PIL import Image
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

創建下載樣本的代碼，主要流程是打開官網的demo頁后，截圖並保存

# 采集代碼
class CrackPuzzleCaptcha():
    # 初始化webdriver
    def init(self):
        self.url = 'https://www.dingxiang-inc.com/business/captcha'
        chrome_options = webdriver.ChromeOptions()
        # chrome_options.add_argument("--start-maximized")
        chrome_options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors","enable-automation"]) # 設置為開發者模式
        path = r'/usr/local/bin/chromedriver' #macOS
#         path = r'D:\Anaconda3\chromedriver.exe' #windows
        self.browser = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
        #設置顯示等待時間
        self.wait = WebDriverWait(self.browser, 20)
        self.browser.get(self.url)
    # 打開驗證碼demo頁面，並強制元素在瀏覽器可視區域
    def openTest(self):
        time.sleep(1)
        self.browser.execute_script('setTimeout(function(){document.querySelector("body > div.wrapper-main > div.wrapper.wrapper-content > div > div.captcha-intro > div.captcha-intro-header > div > div > ul > li.item-8").click();},0)')
        self.browser.execute_script('setTimeout(function(){document.querySelector("body > div.wrapper-main > div.wrapper.wrapper-content > div > div.captcha-intro > div.captcha-intro-body > div > div.captcha-intro-demo").scrollIntoView();},0)')
        time.sleep(1)
    # 找到原圖，webp格式，直接下載保存
    def download(self):
        onebtn = self.browser.find_element_by_css_selector('#dx_captcha_oneclick_bar-logo_2 > span')
        ActionChains(self.browser).move_to_element(onebtn).perform() 
        time.sleep(1)
        #下載webp
        img_url = self.browser.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-top-left_3 > img').get_attribute("src")
        img_address = "test.png" # 樣本文件
        response = urllib.request.urlopen(img_url)
        img = response.read()
        with open(img_address, 'wb') as f:
            f.write(img)
            print('已保存', img_address)
        return self.browser

    def crack(self):
        pass

開始采集

crack = CrackPuzzleCaptcha()
crack.init()
crack.openTest()

browser2 = crack.download()

已保存 test.png

四、調研結果

關鍵1:顯示的拼圖的原圖就是已經亂序的狀態
關鍵2:原圖是一個整體，那么獲取原圖切割並編號，能得到與拼圖過程一致的結果
關鍵3:拼圖只需要做1次換位即可，2x2的矩陣，可以對[1,2,3,4]進行排列組合，得到所有的拼接結果

五、分析過程

1.輔助函數

定義輔助函數，方便獲取參數

# 顯示圖形
def show_images(images: list , title = '') -> None:
    if title!='':
        print(title)
    n: int = len(images)
    f = plt.figure()
    for i in range(n):
        f.add_subplot(1, n, i + 1)
        plt.imshow(images[i])
    plt.show(block=True)
  
# 獲取圖像的基本信息
def getSize(p):
    sum_rows = p.shape[0]
    sum_cols = p.shape[1]
    channels = p.shape[2]
    return sum_rows,sum_cols,channels

2.圖像切割

# 輸入樣本
file = 'test.png'
img = cv2.imread(file)

sum_rows,sum_cols,channels = getSize(img)
part_rows,part_cols = round(sum_rows/2),round(sum_cols/2)
print('樣本圖 高度、寬度、通道',sum_rows,sum_cols,channels)
print('四圖切分，求原圖中心位置',part_rows,part_cols)

part1 = img[0:part_rows, 0:part_cols]
part2 = img[0:part_rows, part_cols:sum_cols]
part3 = img[part_rows:sum_rows, 0:part_cols]
part4 = img[part_rows:sum_rows, part_cols:sum_cols]

print('切割為4個小塊的 W/H/C 信息，並四圖編號：左上=1，右上=2，左下=3，右下=4\n',getSize(part1),getSize(part2),getSize(part3),getSize(part4))

show_images([img],'原圖')
show_images([part1,part2],'切割圖')
show_images([part3,part4])

樣本圖 高度、寬度、通道 150 300 3
四圖切分，求原圖中心位置 75 150
切割為4個小塊的 W/H/C 信息，並四圖編號：左上=1，右上=2，左下=3，右下=4
(75, 150, 3) (75, 150, 3) (75, 150, 3) (75, 150, 3)

原圖

切割圖

完成切割后，還需要重組合並4個圖像，用於匹配最佳結果

3.圖像拼接

# 拼接函數
def merge(sum_rows,sum_cols,channels,p1,p2,p3,p4):
    final_matrix = np.zeros((sum_rows, sum_cols,channels), np.uint8)
    part_rows,part_cols = round(sum_rows/2),round(sum_cols/2)

    final_matrix[0:part_rows, 0:part_cols] = p1
    final_matrix[0:part_rows, part_cols:sum_cols] = p2
    final_matrix[part_rows:sum_rows, 0:part_cols] = p3
    final_matrix[part_rows:sum_rows, part_cols:sum_cols] = p4
    return final_matrix

從編號上來看，應該將 [1,2,3,4] 還原成 [4,2,3,1] 就是正確的圖，測試下還原效果

# 還原圖 
f = merge(sum_rows,sum_cols,channels,part4,part2,part3,part1)
show_images([f],'還原圖 [4,2,3,1]')

還原圖 [4,2,3,1]

4.排列組合

已知 python 實現排列組合非常方便，測試代碼如下

import itertools

# 對應拼圖的4個塊的編號
puzzle_list = [
    "1:左上","2:右下",
    "3:左下","4:右下"
]

result = itertools.permutations(puzzle_list,4)
cnt=0
for x in result:
    cnt+=1
    print(x)
print('共',cnt,'種組合')

('1:左上', '2:右下', '3:左下', '4:右下')
('1:左上', '2:右下', '4:右下', '3:左下')
('1:左上', '3:左下', '2:右下', '4:右下')
('1:左上', '3:左下', '4:右下', '2:右下')
('1:左上', '4:右下', '2:右下', '3:左下')
('1:左上', '4:右下', '3:左下', '2:右下')
('2:右下', '1:左上', '3:左下', '4:右下')
('2:右下', '1:左上', '4:右下', '3:左下')
('2:右下', '3:左下', '1:左上', '4:右下')
('2:右下', '3:左下', '4:右下', '1:左上')
('2:右下', '4:右下', '1:左上', '3:左下')
('2:右下', '4:右下', '3:左下', '1:左上')
('3:左下', '1:左上', '2:右下', '4:右下')
('3:左下', '1:左上', '4:右下', '2:右下')
('3:左下', '2:右下', '1:左上', '4:右下')
('3:左下', '2:右下', '4:右下', '1:左上')
('3:左下', '4:右下', '1:左上', '2:右下')
('3:左下', '4:右下', '2:右下', '1:左上')
('4:右下', '1:左上', '2:右下', '3:左下')
('4:右下', '1:左上', '3:左下', '2:右下')
('4:右下', '2:右下', '1:左上', '3:左下')
('4:右下', '2:右下', '3:左下', '1:左上')
('4:右下', '3:左下', '1:左上', '2:右下')
('4:右下', '3:左下', '2:右下', '1:左上')
共 24 種組合

5.特征提取

采用 merge 函數，對切割的小圖進行組合還原后，轉換為灰度圖並提取輪廓。

# 還原圖 
f = merge(sum_rows,sum_cols,channels,part1,part2,part3,part4)
show_images([f],'還原圖[1,2,3,4]')
# 灰度
gray = cv2.cvtColor(f, cv2.COLOR_BGRA2GRAY)
show_images([gray],'灰度')
# 提取輪廓
edges = cv2.Canny(gray, 35, 80, apertureSize=3)
show_images([edges],'提取輪廓')

還原圖[1,2,3,4]

灰度

提取輪廓

再測試一種新的組合，看看輪廓特征[1,3,2,4]和原始的輪廓特征[4,2,3,1]

f = merge(sum_rows,sum_cols,channels,part1,part3,part2,part4)
gray = cv2.cvtColor(f, cv2.COLOR_BGRA2GRAY)
edges = cv2.Canny(gray, 35, 80, apertureSize=3)
show_images([edges],'提取輪廓')

f = merge(sum_rows,sum_cols,channels,part1,part2,part3,part4)
gray = cv2.cvtColor(f, cv2.COLOR_BGRA2GRAY)
edges = cv2.Canny(gray, 35, 80, apertureSize=3)
show_images([edges],'提取輪廓')

# 正確的
f = merge(sum_rows,sum_cols,channels,part4,part2,part3,part1)
gray = cv2.cvtColor(f, cv2.COLOR_BGRA2GRAY)
edges = cv2.Canny(gray, 35, 80, apertureSize=3)
show_images([edges],'正確的-提取輪廓')

提取輪廓

正確的-提取輪廓

通過提取輪廓，可以看到拼接結果的明顯的線條，錯誤的圖至少存在一條x軸或y軸的線，而拼接成功的基本沒有（線段位置或長度及線條數量可以決定正確率，需要多調整參數並篩選）。

這是因為原圖有明顯的過渡色，它是為了用戶體驗而設計，方便人們使用它的時候，能夠‘容易’的區分，並找出正確的拼圖位置。

f = merge(sum_rows,sum_cols,channels,part1,part2,part3,part4)
show_images([f],'背景漸變色')
show_images([part3,part2,part1,part4],'切割后')
f = merge(sum_rows,sum_cols,channels,part1,part2,part3,part4)
lf = f.copy()
cv2.line(lf, (0, 75), (300, 75), (0, 0, 255), 2)
cv2.line(lf, (150, 0), (150, 150), (0, 0, 255), 2)
show_images([lf],'亂序，漸變色成為了‘十字’特征線')

背景漸變色

切割后

亂序，漸變色成為了‘十字’特征線

6.特征匹配

特征已知后，現在剩下的就是對特征進行檢測，可以計算 x/2,y/2 十字架的色差，也可以用 opencv 的直線提取，測試代碼如下：

f = merge(sum_rows,sum_cols,channels,part1,part2,part3,part4)
gray = cv2.cvtColor(f, cv2.COLOR_BGRA2GRAY)
edges = cv2.Canny(gray, 35, 80, apertureSize=3)
show_images([edges],'提取輪廓')

lines = cv2.HoughLinesP(edges,0.01,np.pi/360,60,minLineLength=50,maxLineGap=10)
if lines is None:
    print('沒找到線條')
else:
    lf = f.copy()
    for line in lines:
        x1, y1, x2, y2 = line[0]
        cv2.line(lf, (x1, y1), (x2, y2), (0, 0, 255), 2)
    show_images([lf])

提取輪廓

嘗試正確的組合 [4,2,3,1]

f = merge(sum_rows,sum_cols,channels,part4,part2,part3,part1)
gray = cv2.cvtColor(f, cv2.COLOR_BGRA2GRAY)
edges = cv2.Canny(gray, 35, 80, apertureSize=3)
show_images([edges],'提取輪廓')

lines = cv2.HoughLinesP(edges,0.01,np.pi/360,60,minLineLength=50,maxLineGap=10)
if lines is None:
    print('沒找到線條')
else:
    lf = f.copy()
    for line in lines:
        x1, y1, x2, y2 = line[0]
        cv2.line(lf, (x1, y1), (x2, y2), (0, 0, 255), 2)
    show_images([lf])

提取輪廓

沒找到線條

7.匹配過程

import itertools

print('原圖順序')
print(1,2)
print(3,4)
show_images([img])

# 按編號，將切割的圖放入list做排列組合
list1 = [
    [1,part1],
    [2,part2],
    [3,part3],
    [4,part4]
]

result = itertools.permutations(list1,4)
idx =1
finded = False
finalResult = []
for x in result:
    # 排列組合合並圖像
    f = merge(sum_rows,sum_cols,channels,x[0][1],x[1][1],x[2][1],x[3][1])
    # 圖像特征提取
    gray = cv2.cvtColor(f, cv2.COLOR_BGRA2GRAY)
    edges = cv2.Canny(gray, 35, 80, apertureSize=3)
    # 直線匹配
    lines = cv2.HoughLinesP(edges,0.01,np.pi/360,60,minLineLength=50,maxLineGap=10)
    if lines is None:
        print('還原圖像')
        show_images([f])
        show_images([gray])
        show_images([edges])
        print('正確順序')
        print(x[0][0],x[1][0])
        print(x[2][0],x[3][0])
        print('完成!!')
        finded = True
        finalResult =[x[0][0],x[1][0],x[2][0],x[3][0]] #獲取最終排列正確的結果
        break
    else:
        print(idx, '排列:' , x[0][0],x[1][0],x[2][0],x[3][0] , '線:', len(lines))
        lf = f.copy()
        for line in lines:
            x1, y1, x2, y2 = line[0]
            cv2.line(lf, (x1, y1), (x2, y2), (0, 0, 255), 2)
#         show_images([lf])
        pass
    idx+=1

print('測試次數',idx,'最終狀態',finded,finalResult)

原圖順序
1 2
3 4

1 排列: 1 2 3 4 線: 4
2 排列: 1 2 4 3 線: 5
3 排列: 1 3 2 4 線: 4
4 排列: 1 3 4 2 線: 2
5 排列: 1 4 2 3 線: 3
6 排列: 1 4 3 2 線: 4
7 排列: 2 1 3 4 線: 3
8 排列: 2 1 4 3 線: 5
9 排列: 2 3 1 4 線: 3
10 排列: 2 3 4 1 線: 3
11 排列: 2 4 1 3 線: 1
12 排列: 2 4 3 1 線: 1
13 排列: 3 1 2 4 線: 2
14 排列: 3 1 4 2 線: 2
15 排列: 3 2 1 4 線: 3
16 排列: 3 2 4 1 線: 3
17 排列: 3 4 1 2 線: 5
18 排列: 3 4 2 1 線: 3
19 排列: 4 1 2 3 線: 4
20 排列: 4 1 3 2 線: 3
21 排列: 4 2 1 3 線: 2

還原圖像

正確順序
4 2
3 1

完成!
測試次數 22 最終狀態 True [4, 2, 3, 1]

8.提取結果

再看看如何這種拼圖，如果要交換位置的組合有12種

list1 = [1,2,3,4]

result = itertools.permutations(list1,2)
idx=0
for x in result:
    idx+=1
    print(idx,x)

1 (1, 2)
2 (1, 3)
3 (1, 4)
4 (2, 1)
5 (2, 3)
6 (2, 4)
7 (3, 1)
8 (3, 2)
9 (3, 4)
10 (4, 1)
11 (4, 2)
12 (4, 3)

#交換函數
def change_check(a,b):
    diffs = []
    if len(a)!=len(b):
        return diffs
  
    for i in range(len(a)):
        if a[i]!=b[i]:
            diffs.append(b[i])
    return diffs

ab = change_check([1,2,3,4],finalResult)
print('原始',[1,2,3,4])
print('最終',finalResult)
print('要交換的位置',ab)

原始 [1, 2, 3, 4]
最終 [4, 2, 3, 1]
要交換的位置 [4, 1]

將‘交換的位置’換算成小圖中心的偏移坐標，采用查表法

#大圖尺寸
pwidth = 150 
pheight = 75 
#小圖xy中心點 = 大圖wh 1/4
px = round(pwidth/2)
py = round(pheight/2)
#創建坐標表
offset_points = [
    [px,py],[px+pwidth,py],
    [px,py+pheight],[px+pwidth,py+pheight]
]
print(offset_points)
print(ab)
#通過結果作為索引，拿到坐標表索引的坐標
drag_start = offset_points[ ab[0] -1 ]
drag_end = offset_points[ ab[1] -1 ]

print('起點偏移坐標',drag_start,'終點偏移坐標',drag_end)

[[75, 38], [225, 38], [75, 113], [225, 113]]
[4, 1]
起點偏移坐標 [225, 113] 終點偏移坐標 [75, 38]

9.模擬操作

至此，已經完成了拼圖還原的分析所有過程，下面采用另一種簡單的方法，move_to_element 方法，內置的拖動 dom-a 到 dom-b 位置，測試下結果

# 模擬聚焦按鈕，讓拼圖顯示出來
onebtn = browser2.find_element_by_css_selector('#dx_captcha_oneclick_bar-logo_2 > span')
ActionChains(browser2).move_to_element(onebtn).perform() 
time.sleep(1)

獲取最終結果

ab = change_check([1,2,3,4],finalResult)
print(ab)

[4, 1]

找到網頁拼圖的dom元素，存儲下來用於操作並交換拼圖

d1 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-top-left_3 > div')
d2 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-top-right_3 > div')
d3 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-bottom-left_3 > div')
d4 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-bottom-right_3 > div')
drag_elements = [d1,d2,d3,d4]

<ipython-input-22-61fb3f895e04>:1: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
d1 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-top-left_3 > div')
<ipython-input-22-61fb3f895e04>:2: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
d2 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-top-right_3 > div')
<ipython-input-22-61fb3f895e04>:3: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
d3 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-bottom-left_3 > div')
<ipython-input-22-61fb3f895e04>:4: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
d4 = browser2.find_element_by_css_selector('#dx_captcha_jigsaw_fragment-bottom-right_3 > div')

找出要拖動的2個dom，並交付給 webdriver

drag_start = drag_elements[ ab[0] -1 ]
drag_end = drag_elements[ ab[1] -1 ]
print('drag_start',drag_start, 'drag_end',drag_end)

drag_start <selenium.webdriver.remote.webelement.WebElement (session="1d7d691bd509cd03cd8b1483da2056ea", element="8439005e-eb70-4b02-856e-eebbe2526d6d")> drag_end <selenium.webdriver.remote.webelement.WebElement (session="1d7d691bd509cd03cd8b1483da2056ea", element="f9239df5-9aa3-43ae-a6af-afacf81eb670")>

ActionChains(browser2).drag_and_drop(drag_start,drag_end).perform()
# browser2.close()

簡單拖一下，目標網站認可了，但它判定是有問題的，又彈出一種新的驗證碼出來，看來僅僅能夠識別還原正確拼圖還只是開端，如何偽造一個讓其認可的運行環境，又是一個新的技術研究領域，值得與各位共同學習與分享交流。

六、終

邊學邊做，如有錯誤之處敬請指出，謝謝！

項目地址：https://github.com/suifei/puzzle-captcha

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 游戲(記憶拼圖Memory_Puzzle) python 游戲(滑動拼圖Slide_Puzzle) 讓你的拼圖聰明起來——自動還原拼圖如何判斷一個拼圖是否可還原！ C# 小游戲-拼圖魔方【Game Puzzle】 jQuery拼圖滑塊驗證 (白嫖版)burp插件驗證碼識別xp_CAPTCHA Laravel - 驗證碼（captcha） canvas驗證碼 - 滑塊拼圖登錄驗證碼實現（Captcha）