【算法】驗證碼識別基礎方法及源碼


      先說說寫這個的背景

      最近有朋友在搞一個東西,已經做的挺不錯了,最后想再完美一點,於是乎就提議把這種驗證碼給K.O.了,於是乎就K.O.了這個驗證碼。達到單個圖片識別時間小於200ms,500個樣本人工統計正確率為95%。由於本人沒有相關經驗,是摸着石頭過河。本着經驗分享的精神,分享一下整個分析的思路。在各位大神面前獻丑了。

 

      再看看部分識別結果

      是不是看着很眼熟?

 

      處理第一步,去背景噪音和二值化

      對於這一塊,考慮了幾種方法。

      方法一,統計圖片顏色分布,顏色占有率低的判定為背景噪音。由於背景噪音和前景色區分並不明顯,嘗試了很多種取景方法都不能很好去除背景噪音,最終放棄了這種方法。

      方法二,事后在網上稍微查了下,最近比較流行計算灰度后設定一個閾值進行二值化。其實所謂的灰度圖片原理是根據人眼對色彩敏感度取了權值,這個權值對計算機來說沒有什么意義。稍微想一下就可以發現,這兩個過程完全可以合並。於是乎我一步完成了去背景噪音和二值化。閾值設置為RGB三分量之和到500。結果非常令人滿意。

 

      處理第二步,制作字符樣本

      樣本對於計算機來說是非常重要的,因為計算機很難有邏輯思維,就算有邏輯思維也要經過長期訓練才能讓你滿意。所以要用事先制作好的樣本進行比較。如果你仔細觀察過這些驗證碼會發現一個bug,幾乎大部分的驗證碼都是使用同樣的字體,於是乎就人工制作了一套字體的樣本。由於上一步已經有去除背景噪音的結果,可以直接利用。制作樣本這一步有點簡單枯燥,還需要細心。可能因為你的一個不細心會導致某個符號的識別率偏低。在這500個樣本中,只發現了31個字符。幸虧是某部門的某人員還考慮到了易錯的字符,例如,1和I,0和O等。要不然這個某部門要背負更多的罵名。

 

      處理第三步,匹配

      單個匹配用了最簡單最原始的二值比較,不過匹配的是匹配率而不是匹配數。我定義了相關的計分原則。大原則是“該有的有了加分,該有的沒了減分,不該有的有了適度減分,可達區域外的不算分”。

      由於一些符號的部分區域匹配結果跟另一些符號的完整匹配結果相似,需要把單個匹配在一個擴大的區域內擇優。在一定的范圍內,找到一個最佳匹配,這個最佳匹配就是當前位置對應的符號。

      完成了一次最佳匹配,可以把匹配位置向右推進一大步,若找不到合適的最佳匹配就向右推進一小步。

 

      處理第四步,優化和調整

      任何一個算法都是需要優化和調整的。現在要找到最佳參數配置和最佳代碼組織。這一步往往是需要花費最多時間和精力的。

 

      處理第五步,驗證結果

      這一步呢,純人力驗證結果,統計出正確率。

 

      思考

      結果是出來了,代碼也不多,效果也很理想。搞這一行的,很多時候都想要通用的。能否通用,很大程度上在於抽象層次。本方法只是單純的匹配,自然不能通用,但是方法和思想卻是通用的。具體案例具體分析。至於扭曲文字、空心文字等,處理要復雜的多。網上也有一些使用第三方圖像庫的方法,也許那些方法會比較通用。等有空了有興趣了繼續搞一下這個主題。

 

      源碼

      至於這個源碼要不要發布,糾結了一段時間。網上已經有類似的商業活動了,而且這個識別本身沒有太大難度,再加上某系統天生的bug,此驗證碼本身就相當於沒有設置,因此發布此代碼,僅作於學習交流。

using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.IO.Compression;

namespace Crack12306Captcha
{
    public class Cracker
    {
        List<CharInfo> words_ = new List<CharInfo>();

        public Cracker()
        {
            var bytes = new byte[] { 
                0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0xc5, 0x58, 0xd9, 0x92, 0x13, 0x31, 
                0x0c, 0x94, 0x9e, 0x93, 0x0c, 0x61, 0x97, 0x2f, 0xe1, 0x58, 0xe0, 0x91, 0x9b, 0x82, 0x62, 0x0b, 
                0x58, 0xee, 0xff, 0xff, 0x10, 0xd8, 0xcc, 0xc8, 0xea, 0x96, 0x6c, 0x8f, 0x13, 0x48, 0xe1, 0xaa, 
                0x4d, 0x46, 0x96, 0x6d, 0xb5, 0x8e, 0x96, 0x67, 0x73, 0x7f, 0x3b, 0x09, 0x0e, 0x25, 0x41, 0x49, 
                0xa3, 0xae, 0xd7, 0x5b, 0xa9, 0xa8, 0xd5, 0xb4, 0x76, 0x02, 0x6a, 0x5c, 0x52, 0x94, 0x54, 0xed, 
                0x18, 0x5a, 0x7f, 0x18, 0x00, 0x00, 0x84, 0x07, 0x1b, 0x80, 0x4a, 0x9a, 0x08, 0x35, 0xb8, 0x81, 
                0x50, 0xe7, 0xad, 0xbe, 0xc4, 0x8e, 0xb1, 0x4f, 0x2d, 0x5f, 0xba, 0x80, 0xbb, 0xfd, 0x9a, 0xad, 
                0x19, 0x36, 0xe5, 0xad, 0x87, 0xf1, 0x10, 0xc0, 0x8d, 0xc6, 0x50, 0x40, 0x52, 0xf8, 0xb3, 0x98, 
                0x2c, 0xd6, 0xec, 0x59, 0xe7, 0x0d, 0x3e, 0x0f, 0x93, 0x3e, 0x1d, 0x02, 0x7a, 0x18, 0x8f, 0xb6, 
                0xc7, 0x46, 0x4e, 0x01, 0xa3, 0x96, 0xdc, 0x3a, 0x20, 0x77, 0xbf, 0x2c, 0x24, 0xe4, 0x80, 0xa9, 
                0x20, 0x14, 0xe5, 0x2d, 0xb5, 0x68, 0xc9, 0x55, 0x89, 0x23, 0x96, 0x82, 0xaa, 0xba, 0x58, 0xa6, 
                0x03, 0x38, 0x71, 0x4b, 0x29, 0xd2, 0x47, 0x80, 0xe3, 0x84, 0x91, 0xf4, 0x78, 0x43, 0x64, 0x41, 
                0x7b, 0x73, 0x99, 0x80, 0x42, 0x48, 0x00, 0xde, 0x00, 0x12, 0x88, 0x80, 0xdb, 0x51, 0x4a, 0x49, 
                0x84, 0x43, 0xf6, 0x51, 0x90, 0x27, 0x21, 0xc9, 0xf8, 0xac, 0x00, 0x4d, 0xcd, 0x46, 0x09, 0x9d, 
                0x15, 0x78, 0xe0, 0x00, 0x1e, 0x44, 0x2a, 0x51, 0x8c, 0xbc, 0xd3, 0xa3, 0x68, 0x8a, 0xd5, 0x3a, 
                0x20, 0x79, 0xba, 0x4d, 0x71, 0x4c, 0x0b, 0x91, 0x98, 0x90, 0x7b, 0x2a, 0x42, 0xc5, 0x78, 0x7a, 
                0xfc, 0xd5, 0x1b, 0x4b, 0x09, 0xa7, 0x27, 0x99, 0x38, 0x05, 0x01, 0xc2, 0x80, 0x39, 0x9c, 0x67, 
                0xbb, 0x4e, 0x7f, 0x6c, 0x33, 0xdd, 0xed, 0x87, 0x55, 0xda, 0x5d, 0xb5, 0x56, 0x33, 0xc6, 0xf9, 
                0xea, 0x60, 0x64, 0xcf, 0xa7, 0x41, 0xe0, 0x5c, 0x1c, 0xc4, 0xb2, 0x25, 0xa3, 0x89, 0x88, 0x8d, 
                0x16, 0x00, 0xb5, 0xed, 0xa5, 0x22, 0x9d, 0x52, 0x41, 0x53, 0x8d, 0x92, 0x7f, 0x31, 0x51, 0x3f, 
                0xa8, 0x00, 0x85, 0x8a, 0x71, 0x10, 0x92, 0x78, 0xc4, 0x59, 0x08, 0x39, 0x69, 0xa9, 0x38, 0x41, 
                0x48, 0xf7, 0x40, 0x5a, 0x03, 0xd5, 0x3a, 0xf5, 0xe5, 0x9d, 0x33, 0x66, 0xc3, 0xd7, 0x1f, 0xef, 
                0x94, 0xa0, 0x53, 0xea, 0xf4, 0x15, 0xb2, 0x1c, 0x40, 0x2d, 0xcf, 0xaf, 0xce, 0xe9, 0xd4, 0x7a, 
                0x89, 0x09, 0xe6, 0xdd, 0xdb, 0x0e, 0xb8, 0x58, 0xa7, 0x60, 0x37, 0xfd, 0xf2, 0xfa, 0x2c, 0x4e, 
                0x51, 0x87, 0x0d, 0xfc, 0x16, 0x72, 0x2a, 0x5f, 0xc0, 0x80, 0xf0, 0x54, 0xa7, 0xde, 0xfc, 0x15, 
                0x8b, 0x9a, 0x36, 0x3a, 0x2c, 0x62, 0xfc, 0xd4, 0x8c, 0x31, 0xb7, 0xea, 0xd7, 0x26, 0xc4, 0xaf, 
                0x75, 0xea, 0xdb, 0x8b, 0xff, 0x9b, 0x9b, 0x50, 0x7e, 0xfe, 0x15, 0xab, 0x17, 0x2f, 0x96, 0x96, 
                0xbd, 0xaa, 0x87, 0xdd, 0x77, 0xa3, 0x77, 0xd3, 0x85, 0xf0, 0xe0, 0x58, 0xd5, 0xf6, 0x8c, 0xcd, 
                0xc4, 0x63, 0x52, 0x12, 0x48, 0x46, 0x0f, 0x93, 0x5a, 0xe3, 0xea, 0x24, 0x67, 0x73, 0x63, 0xa0, 
                0xdf, 0xdf, 0x3d, 0x67, 0xf6, 0xa9, 0xfc, 0xed, 0x08, 0xe3, 0x82, 0x57, 0x08, 0x35, 0x47, 0x68, 
                0x9c, 0x01, 0x40, 0x87, 0x8b, 0xbd, 0x0c, 0xb3, 0xf4, 0xe1, 0x72, 0xd7, 0x54, 0x62, 0xfd, 0x40, 
                0xed, 0x99, 0xa6, 0x7e, 0x2b, 0xe4, 0xb4, 0xc4, 0x62, 0x0d, 0x79, 0xae, 0x1b, 0xd7, 0xf4, 0x09, 
                0xb7, 0xe1, 0x7c, 0x44, 0x09, 0x9a, 0xda, 0xff, 0x52, 0x6a, 0x3c, 0xe1, 0xc8, 0xd7, 0xbd, 0xbb, 
                0xbe, 0x37, 0xfc, 0xd6, 0xd5, 0x4e, 0x3c, 0x40, 0x2a, 0x4b, 0x39, 0x1a, 0xbd, 0x2a, 0xcd, 0xc1, 
                0x18, 0x59, 0x40, 0x62, 0x78, 0xec, 0x63, 0x19, 0x72, 0xf0, 0xcf, 0xf8, 0x38, 0xfa, 0x42, 0x3a, 
                0xc8, 0x02, 0xec, 0x5b, 0xeb, 0x8d, 0xae, 0xf1, 0x45, 0xdd, 0x32, 0x98, 0x35, 0x3c, 0x9f, 0xa6, 
                0x3d, 0xce, 0x13, 0xce, 0x94, 0x38, 0x87, 0x00, 0x8d, 0x85, 0xc4, 0x70, 0x17, 0x26, 0x0e, 0xa6, 
                0x1e, 0x16, 0xcb, 0xbf, 0x52, 0xdf, 0x29, 0x63, 0xc4, 0xf6, 0x8c, 0x35, 0xba, 0xf2, 0xf9, 0x1f, 
                0xbf, 0x73, 0x1f, 0x91, 0x1b, 0x9e, 0x24, 0x5e, 0x63, 0x22, 0x82, 0x23, 0x05, 0x19, 0xb9, 0x71, 
                0x73, 0xdc, 0xcf, 0x05, 0x88, 0x94, 0x71, 0xdb, 0xdd, 0x48, 0x10, 0xd5, 0x55, 0xb3, 0x52, 0xc3, 
                0x1b, 0x01, 0x94, 0x13, 0x74, 0x94, 0x3a, 0x80, 0x2f, 0x39, 0xe2, 0x75, 0x0e, 0xf2, 0xc6, 0x18, 
                0xdc, 0x46, 0xfc, 0xf3, 0xea, 0x14, 0x80, 0xc1, 0xce, 0x24, 0xee, 0x72, 0xed, 0x94, 0xaf, 0xfb, 
                0xa9, 0xaa, 0x4a, 0xe0, 0xd4, 0x22, 0xc6, 0xf0, 0x57, 0x1d, 0x8e, 0xd2, 0x90, 0xc6, 0x0c, 0xd3, 
                0x9a, 0x53, 0xfb, 0xd6, 0xb7, 0xdd, 0x14, 0xd4, 0xbd, 0x41, 0xa7, 0x80, 0x7b, 0x23, 0xfe, 0x34, 
                0x56, 0x0d, 0x96, 0x46, 0x02, 0xfe, 0xfd, 0xb2, 0x00, 0x5f, 0x01, 0x9c, 0xa0, 0x32, 0x39, 0xd7, 
                0x90, 0xc2, 0x6c, 0xc7, 0x4e, 0x68, 0x88, 0x7d, 0x9f, 0x9b, 0xcf, 0xa7, 0xbe, 0xa0, 0xfc, 0x18, 
                0x7d, 0x07, 0x5b, 0xa9, 0xbe, 0x56, 0x1f, 0x67, 0x1a, 0x4a, 0x91, 0x9c, 0x04, 0x38, 0x53, 0x6b, 
                0x70, 0x68, 0x8f, 0xea, 0xf4, 0x34, 0x87, 0x7f, 0x6e, 0x82, 0xc3, 0xc1, 0xab, 0x40, 0xc4, 0x50, 
                0x13, 0x0e, 0x33, 0x5d, 0x67, 0x7d, 0x01, 0x1f, 0xdb, 0xc0, 0x7f, 0xed, 0x87, 0x7f, 0xbc, 0x0f, 
                0x75, 0xe0, 0xa5, 0xba, 0xc0, 0x84, 0x3d, 0x24, 0x04, 0xe0, 0xf1, 0x16, 0x41, 0x3b, 0x74, 0xd2, 
                0x52, 0xc5, 0xf8, 0x7c, 0x12, 0xfb, 0xe4, 0x37, 0x5b, 0xfb, 0x57, 0x11, 0xa1, 0x18, 0x00, 0x00, 
            };
            using (var stream = new MemoryStream(bytes))
            using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
            using (var reader = new BinaryReader(gzip))
            {
                while (true)
                {
                    char ch = reader.ReadChar();
                    if (ch == '\0')
                        break;
                    int width = reader.ReadByte();
                    int height = reader.ReadByte();

                    bool[,] map = new bool[width, height];
                    for (int i = 0; i < width; i++)
                        for (int j = 0; j < height; j++)
                            map[i, j] = reader.ReadBoolean();
                    words_.Add(new CharInfo(ch, map));
                }
            }
        }

        public string Read(Bitmap bmp)
        {
            var result = string.Empty;
            var width = bmp.Width;
            var height = bmp.Height;
            var table = ToTable(bmp);
            var next = SearchNext(table, -1);

            while (next < width - 7)
            {
                var matched = Match(table, next);
                if (matched.Rate > 0.6)
                {
                    result += matched.Char;
                    next = matched.X + 10;
                }
                else
                {
                    next += 1;
                }
            }

            return result;
        }

        private bool[,] ToTable(Bitmap bmp)
        {
            var table = new bool[bmp.Width, bmp.Height];
            for (int i = 0; i < bmp.Width; i++)
                for (int j = 0; j < bmp.Height; j++)
                {
                    var color = bmp.GetPixel(i, j);
                    table[i, j] = (color.R + color.G + color.B < 500);
                }
            return table;
        }

        private int SearchNext(bool[,] table, int start)
        {
            var width = table.GetLength(0);
            var height = table.GetLength(1);
            for (start++; start < width; start++)
                for (int j = 0; j < height; j++)
                    if (table[start, j])
                        return start;

            return start;
        }

        private double FixedMatch(bool[,] source, bool[,] target, int x0, int y0)
        {
            double total = 0;
            double count = 0;
            int targetWidth = target.GetLength(0);
            int targetHeight = target.GetLength(1);
            int sourceWidth = source.GetLength(0);
            int sourceHeight = source.GetLength(1);
            int x, y;

            for (int i = 0; i < targetWidth; i++)
            {
                x = i + x0;
                if (x < 0 || x >= sourceWidth)
                    continue;
                for (int j = 0; j < targetHeight; j++)
                {
                    y = j + y0;
                    if (y < 0 || y >= sourceHeight)
                        continue;

                    if (target[i, j])
                    {
                        total++;
                        if (source[x, y])
                            count++;
                        else
                            count--;
                    }
                    else if (source[x, y])
                        count -= 0.55;
                }
            }

            return count / total;
        }

        private MatchedChar ScopeMatch(bool[,] source, bool[,] target, int start)
        {
            int targetWidth = target.GetLength(0);
            int targetHeight = target.GetLength(1);
            int sourceWidth = source.GetLength(0);
            int sourceHeight = source.GetLength(1);

            double max = 0;
            var matched = new MatchedChar();
            for (int i = -2; i < 6; i++)
                for (int j = -3; j < sourceHeight - targetHeight + 5; j++)
                {
                    double rate = FixedMatch(source, target, i + start, j);
                    if (rate > max)
                    {
                        max = rate;
                        matched.X = i + start;
                        matched.Y = j;
                        matched.Rate = rate;
                    }
                }
            return matched;
        }

        private MatchedChar Match(bool[,] source, int start)
        {
            MatchedChar best = null;
            foreach (var info in words_)
            {
                var matched = ScopeMatch(source, info.Table, start);
                matched.Char = info.Char;
                if (best == null || best.Rate < matched.Rate)
                    best = matched;
            }
            return best;
        }

        private class CharInfo
        {
            public char Char { get; private set; }
            public bool[,] Table { get; private set; }

            public CharInfo(char ch, bool[,] table)
            {
                Char = ch;
                Table = table;
            }
        }

        private class MatchedChar
        {
            public int X { get; set; }
            public int Y { get; set; }
            public char Char { get; set; }
            public double Rate { get; set; }
        }
    }
}

      用法

var cracker = new Cracker();
var result = cracker.Read(img);


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM