卷積神經網絡（CNN）學習算法之----基於LeNet網絡的中文驗證碼識別

本文轉載自查看原文 2016-12-19 12:02 5183 python/ caffe/ 圖片識別/ Lenet/ Deep Learning

　　由於公司需要進行了中文驗證碼的圖片識別開發，最近一段時間剛忙完上線，好不容易閑下來就繼上篇《基於Windows10 x64+visual Studio2013+Python2.7.12環境下的Caffe配置學習》文章，記錄下利用caffe進行中文驗證碼圖片識別的開發過程。由於這里主要介紹開發和實現過程，CNN理論性的東西這里不作為介紹的重點，遇到相關的概念和術語請自行研究。目前從我們訓練出來的模型來看，單字識別率接近96%，所以一個四字驗證碼的准確率大概80%，效果還不錯，完全能滿足使用，如果每張圖片的樣本繼續加大應該能取得更高的准確率，當然隨着樣本的加大，訓練時間也隨之增大，對硬件設備要求也越高，還有就是優化LeNet網絡結構，目前這里只使用了三層卷積。

（一）開發准備

（1）開發環境

軟件環境：visual Studio2013+Python2.7.12+caffe

硬件環境：Intel Core i7-4790+GTX1080+RAM32G

（2）訓練圖片

　　可以用於驗證碼的中文常用字大概3666個，每個字的訓練大於等於50個，所以總共訓練樣本大概20萬，其中80%用於訓練集，20%用於測試集。樣本收集是一個非常麻煩和耗時的過程，需要手工標注結果，我這里利用手工打碼平台收集，最便宜一個驗證碼要4分錢，可以大概算一下，光為了收集這么多樣本就將近花費1萬RMB，還有配置一個GTX1080的顯卡大概6千RMB，這點成本對一個公司還好，如果是對於個人投入還是不少，所以對於實驗室的學生黨玩深度學習成本還是蠻高的！

　　訓練集：26萬樣本圖片

　　測試集：13萬樣本圖片

（二）圖片樣本處理

　　目前驗證碼種類無極繁多，有數字、字母、中文、圖片等等，不過本文主要介紹中文驗證碼的識別。中文驗證碼設計干擾的方式主要圍繞：

　　（1）背景色干擾

　　（2）文字傾斜扭曲

　　（3）干擾線

　　（4）中文拼音並存（百度九宮格）

　　（5）疊字

　　針對不同類型的驗證碼需要分別處理，這些處理過程統稱圖片預處理，目前並沒有統一的預處理方式，需要針對不同的驗證碼做特殊處理，但是大體過程無外乎：灰度化、二值化、去干擾線、分割切圖、標准化，這些過程用python實現都非常的簡單，這里就不詳細介紹了，直接上代碼，需要import cv2：

 1 class PreProcess(object):
 2     """description of class"""
 3     def ConvertToGray(self,Image,filename):
 4         GrayImage=cv2.cvtColor(Image,cv2.COLOR_BGR2GRAY)
 5         return GrayImage
 6        
 7     def ConvertTo1Bpp(self,GrayImage,filename):
 8       Bpp=cv2.threshold(GrayImage,127,255,cv2.THRESH_BINARY)
 9         cv2.imwrite('D://'+'1.jpg',Bpp[1])
10         return Bpp
11 
12     def InterferLine(self,Bpp,filename):
13         for i in range(0,76):
14             for j in range(0,Bpp.shape[0]):
15                 Bpp[j][i]=255
16         for i in range(161,Bpp.shape[1]):
17             for j in range(0,Bpp.shape[0]):
18                 Bpp[j][i]=255        
19         m=1
20         n=1
21         for i in range(76,161):
22             while(m<Bpp.shape[0]-1):
23                 if Bpp[m][i]==0:
24                     if Bpp[m+1][i]==0:
25                         n=m+1
26                     elif m>0 and Bpp[m-1][i]==0:
27                         n=m
28                         m=n-1
29                     else:
30                         n=m+1
31                     break
32                 elif m!=Bpp.shape[0]:
33                     l=0
34                     k=0
35                     ll=m
36                     kk=m
37                     while(ll>0):
38                         if Bpp[ll][i]==0:
39                             ll=11-1
40                             l=l+1
41                         else:
42                             break
43                     while(kk>0):
44                         if Bpp[kk][i]==0:
45                             kk=kk-1
46                             k=k+1
47                         else:
48                             break
49                     if (l<=k and l!=0) or (k==0 and l!=0):
50                         m=m-1
51                     else:
52                         m=m+1
53                 else:
54                     break
55                 #endif
56             #endwhile
57             if m>0 and Bpp[m-1][i]==0 and Bpp[n-1][i]==0:
58                 continue
59             else:
60                 Bpp[m][i]=255
61                 Bpp[n][i]=255
62             #endif
63         #endfor
64         return Bpp
65 
66     def CutImage(self,Bpp,filename):
67         b1=np.zeros((Bpp.shape[0],20))
68         for i in range(78,98):
69             for j in range(0,Bpp.shape[0]):
70                 b1[j][i-78]=Bpp[j][i]
71         cv2.imwrite(outpath+filename.decode('gbk')[0].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b1)
72 
73         b2=np.zeros((Bpp.shape[0],19))
74         for i in range(99,118):
75             for j in range(0,Bpp.shape[0]):
76                 b2[j][i-99]=Bpp[j][i]
77         cv2.imwrite(outpath+filename.decode('gbk')[1].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b2)
78 
79         b3=np.zeros((Bpp.shape[0],19))
80         for i in range(119,138):
81             for j in range(0,Bpp.shape[0]):
82                 b3[j][i-119]=Bpp[j][i]
83         cv2.imwrite(outpath+filename.decode('gbk')[2].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b3)
84 
85         b4=np.zeros((Bpp.shape[0],19))
86         for i in range(139,158):
87             for j in range(0,Bpp.shape[0]):
88                 b4[j][i-139]=Bpp[j][i]
89         cv2.imwrite(outpath+filename.decode('gbk')[3].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b4)
90         #return (b1,b2,b3,b4)

預處理

調用預處理方法的代碼：

1 import cv2
2 PP=PreProcess()
3 for root,dirs,files in os.walk(inpath):
4     for filename in files:
5         Img=cv2.imread(root+'/'+filename)#太坑，此處inpath不能包含中文路徑
6         GrayImage=PP.ConvertToGray(Img,filename)
7         Bpp=PP.ConvertTo1Bpp(GrayImage,filename)
8         Bpp_new=PP.InterferLine(Bpp,filename)
9         b=PP.CutImage(Bpp_new,filename)

批量處理圖片

處理前的圖片：

預處理后的圖片：

（三）caffe模型配置

　　模型配置階段，需要進行caffe所需數據格式准備、訓練集和測試集准備、Lenet網絡結構配置等三步

　　（1）訓練集和測試集准備

　　　　預處理階段將驗證碼切割成四個圖片后，需要將每個圖片進行標准化為32*32像素大小的圖片，不然caffe模型無法訓練。標准化完成以后就需要把每個字的圖片分拆到訓練集和測試集中去，這里代碼就不貼了，根據個人喜好我設置一個字的訓練集占80%，測試集占20%，然后把所有字用一個字典進行映射為數字編號，方便模型給出結果時我們能找到對應的漢字。

　　（2）caffe格式數據

　　　　為了生成caffe所需數據格式需要用到convert_imageset項目，在第一篇配置中已經編譯好了這個項目，可以直接拿過來用，python調用代碼如下：

    path=os.getcwd()#保存當前路徑
    os.chdir("./caffe-master/caffe-master/Build/x64/Debug")#改變路徑到caffe.exe文件夾
    os.system('SET GLOG_logtostderr=1')
    #生成訓練集
    os.system('convert_imageset.exe --shuffle ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/train  ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/train.txt  ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/trainldb 0')
    #生成測試集
    os.system('convert_imageset.exe --shuffle ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/val  ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/val.txt  ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/testldb 0')

調用convert_imageset生成caffe數據格式

　　　　生成成功過后可以分別在訓練集和測試集文件夾看到如下兩個文件：data.mdb和lock.mdb，都是caffe標准mdb格式的數據

　　（3）Lenet網絡模型

　　　　目前Lenet模型已經非常成熟，最常用的是Lenet-5(5層)，對於層數不需要太多的CNN網絡用它完全足夠了，當然現在更強大的模型還有：Alexnet、googlenet,VGG,resnet。resnet是今年剛出的，據benchmark的測試，對於人臉識別它可以完爆其他網絡，層數更是可以多達200，有興趣的可以看看：GitHub測評項目。對於Lenet有一個可視化的配置網站：http://ethereon.github.io/netscope/#/editor，這里配置的三層結構如下：

　　　　模型總共包含三個卷積層，兩個池化層，模型中最重要的幾個設置參數：num_output、kernel_size、stride需要分別配置，模型的好壞除了層數結構的設計外，就看這幾個參數是否配置的合理，具體的配置這里不詳細講解，相關講解文章非常的多，也有很多優秀的論文可以借鑒，模型的結構代碼如下：

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "E:/work/meb/Deeplearning/caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/trainldb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "E:/work/meb/Deeplearning/caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/testldb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 64
    kernel_size: 7
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 256
    pad:1
    kernel_size: 6
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}

layer {
  name: "conv3"
  type: "Convolution"
  bottom: "conv2"
  top: "conv3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 1024
    pad:1
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}

layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv3"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 3666
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 3666
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

View Code

（四）訓練模型

　　　到目前為止，准備工作都做完了，現在就可以利用python import caffe進行模型訓練了，模型訓練速度快慢主要看你GPU的配置如何，我開始用的GTX650，訓練5000輪下來，就得消耗半天時間，實在無法忍受這個速度，就向公司申請買了一個GTX1080，那速度簡直沒法比，訓練5000輪半個小時就能完成。調用模型的代碼如下：

    cmd='caffe.exe train -solver=./caffe-master/caffe-master/windows/CaptchaTest/dpsample/solver/lenet_solver.prototxt'#訓練語句
    os.system(cmd)
    os.chdir(path)

　　模型訓練中主要的輸出參數有：loss，accuracy，如果你看到loss一直在收斂，每500輪輸出一次的准確率也在提高，那么說明你的模型設計沒什么問題，不然就得重新設計。訓練完成后就能得到如下模型：

（五）使用模型

　　模型訓練完成后，我們就可以簡單的用測試圖片進行測試，測試代碼如下:

    #調用模型
    deploy='.\dpsample\solver\lenet_deploy.prototxt'    #deploy文件
    caffe_model='.\dpsample\iterate_iter_5000.caffemodel'   #訓練好的 caffemodel
    imgtest='./dpsample/data/val/685_363.png'    #隨機找的一張待測圖片

    net = caffe.Net(deploy, caffe_model, caffe.TEST)
    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})  #設定圖片的shape格式(1,3,32,32)
    transformer.set_transpose('data', (2,0,1))    #改變維度的順序，由原始圖片(28,28,3)變為(3,28,28)
    #transformer.set_mean('data', np.load(mean_file).mean(1).mean(1))    #減去均值，前面訓練模型時沒有減均值，這兒就不用
    #transformer.set_raw_scale('data', 1)    # 縮放到【0，1】之間    已經在網絡里設置scale，這里可以不用
    transformer.set_channel_swap('data', (2,1,0))   #交換通道，將圖片由RGB變為BGR
    im=caffe.io.load_image(imgtest)                   #加載圖片
    net.blobs['data'].data[...] = transformer.preprocess('data',im)       #執行上面設置的圖片預處理操作，並將圖片載入到blob中
    out = net.forward()
    prob= net.blobs['prob'].data[0].flatten() #取出最后一層（Softmax）屬於某個類別的概率值，並打印
    print prob
    order=prob.argsort()[-1]
    print(order)

　　最后輸出的order就是模型預測出最有可能文字的序號，再到文字和序號對應的字典中去查看就知道這里的識別對不對了！

#寫在最后# 我是一個忠實的VS用戶，所有代碼都在VS編輯器實現的，它要能用python需要安裝一個PTVS插件，在這里編輯python代碼需要非常注意中文編碼的處理，否則你會吃大苦頭，不過相信我，其他編輯器能搞定的VS也一定沒問題，只是你要有足夠的耐心，遇到問題的時候多思考多搜搜問題的本質所在。

原創性聲明：

本人在cnblogs上的ID為marso，博客地址為http://www.cnblogs.com/marso/，所有包含原創聲明的博客均為本人原創作品。博客的內容除已注明的引用文獻外均為本人獨立研究成果。除特殊注明外均采用知識共享署名-非商業性使用-相同方式共享 3.0 中國大陸許可協議進行許可。

作品及其衍生品不得未經許可而用於商業用途。個人使用無需許可，但必須在引用部分（代碼，基本思想或實現方案等）標示出原作者信息，發布時需附帶本聲明。（對於GPLv3下發布的作品，引用本人作品不需要許可，也不需要標示出引用部分或附帶聲明。）

關於“原創”的說明：文章發布前我會進行初步的相關搜索，如未發現有別人發表過類似內容的作品我才會發表。但是因為可能有遺漏，所以不能保證我的博客內容一定為“首創”，但一定可以保證“原創”。

歡迎轉載，並請注明以下內容：

轉載自博客園marso的Blog, 博客地址為http://%博客URL%，采用 知識共享 署名-非商業性使用-相同方式共享 3.0 中國大陸 許可協議 進行許可。其中“%博客URL%”替換為被轉載博客的實際URL。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 深度學習之卷積神經網絡(CNN)的應用-驗證碼的生成與識別寫給程序員的機器學習入門 (八) - 卷積神經網絡 (CNN) - 圖片分類和驗證碼識別基於LeNet網絡的中文驗證碼識別驗證碼進階（TensorFlow--基於卷積神經網絡的驗證碼識別） CNN卷積神經網絡人臉識別 cnn 卷積神經網絡人臉識別 CNN-1: LeNet-5 卷積神經網絡模型卷積神經網絡之LeNet 神經網絡實現Discuz驗證碼識別深度學習之卷積神經網絡（CNN）