最近個人有發票需要報銷,因此需要到國家稅務局全國增值稅發票查驗平台上進行發票查驗,由於該驗證碼有很多干擾像素,肉眼看都容易出錯,因此產生興趣利用Keras深度學習制作一個模型用於該驗證碼識別。本實例中僅識別英文字符,因為該驗證碼中包含中文字符的數目並不是很多,因此只要能准確識別出英文字符,就可以滿足識別要求,以后有時間精力的話再做一款兼容中文字符的驗證碼。
# 1 准備工作
工欲善其事必先利其器,做深度學習識別驗證碼當然少不了樣本集。首先打開國稅局查驗官網上看看驗證碼長什么樣。
國稅局驗證碼分為四種:全黑字符、紅色字符、黃色字符、藍色字符。我這里采用的識別算法是每種字符都會訓練一個模型,黑色字符模型專門用來識別黑色字符,紅色字符模型專門用來識別紅色字符,依次類推。
每種字符我們准備10萬張驗證碼,4種驗證碼總共准備挨40萬張,建議最好從網站上下載真實的驗證碼進行標記。網上有的博主采用自己造驗證碼,如果自己造的能夠與真實驗證碼95%的接近就可以用自己造的。對於公司級,舍得投入的話,幾十萬張驗證碼可以花錢請標記公司來標記;對於我這個小博主,沒那么多錢找人標記,又不像自己標,因此我也采用了造的驗證碼。
接下來說一下驗證碼標記,對於驗證碼標記,簡單的做法就是重命名。對於全黑字符的標記,6個字符就是驗證碼的名稱,如下所示:
對於其它顏色的字符驗證碼,例如紅色字符驗證碼,只標記紅色字符,如下所示:
按照上述方法進行標記,當把幾十萬張驗證碼標記好了,就可以進行模型的訓練了。
# 2 識別模型訓練
對文本識別,當然要選擇CRNN+CTC的方式,這是一種端對端文本識別算法,目前除了應用於字符識別,也應用於語音識別等其它方向。
我的crnn卷積層網絡如下所示:
```python
rununit=256
model=models.Sequential()
model.add(Conv2D(64,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu',input_shape=(32,None,3)))
model.add(MaxPooling2D(pool_size=(2,2),strides=None,padding='valid'))
model.add(Conv2D(128,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=None,padding='valid'))
model.add(Conv2D(256,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(256,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(ZeroPadding2D(padding=(0,1)))
model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1),padding='valid'))
model.add(Conv2D(512,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(BatchNormalization(axis=1))
model.add(Conv2D(512,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(BatchNormalization(axis=1))
model.add(ZeroPadding2D(padding=(0,1)))
model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1),padding='valid'))
model.add(Conv2D(512,kernel_size=(2,2),strides=(1,1),padding='valid',activation='relu'))
model.add(Permute(dims=(2,1,3)))
model.add(TimeDistributed(Flatten()))
model.add(Bidirectional(GRU(rununit,return_sequences=True,kernel_initializer='he_normal')))
model.add(Dense(rununit,activation='linear'))
model.add(Bidirectional(GRU(rununit,return_sequences=True,kernel_initializer='he_normal')))
model.add(Dropout(rate=0.25))
model.add(Dense(n_class,activation='softmax',kernel_initializer='he_normal'))
```
整個Keras的ctc模型網絡結構如下:
```bash
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
conv2d_input (InputLayer) [(None, 32, None, 3) 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 32, None, 64) 1792 conv2d_input[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, None, 64) 0 conv2d[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 16, None, 128 73856 max_pooling2d[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 8, None, 128) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 8, None, 256) 295168 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 8, None, 256) 590080 conv2d_2[0][0]
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D) (None, 8, None, 256) 0 conv2d_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 4, None, 256) 0 zero_padding2d[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 4, None, 512) 1180160 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 4, None, 512) 16 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 4, None, 512) 2359808 batch_normalization[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 4, None, 512) 16 conv2d_5[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 4, None, 512) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, 2, None, 512) 0 zero_padding2d_1[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 1, None, 512) 1049088 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
permute (Permute) (None, None, 1, 512) 0 conv2d_6[0][0]
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, 512) 0 permute[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional) (None, None, 512) 1182720 time_distributed[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, None, 256) 131328 bidirectional[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 512) 789504 dense[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, None, 512) 0 bidirectional_1[0][0]
__________________________________________________________________________________________________
the_labels (InputLayer) [(None, 6)] 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 37) 18981 dropout[0][0]
__________________________________________________________________________________________________
input_length (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
label_length (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
ctc_loss (Lambda) (None, 1) 0 the_labels[0][0]
dense_1[0][0]
input_length[0][0]
label_length[0][0]
==================================================================================================
Total params: 7,672,517
Trainable params: 7,672,501
Non-trainable params: 16
__________________________________________________________________________________________________
```
使用該網絡訓練,每種驗證碼訓練得到一個模型,一共得到4個模型。
# 3 模型的驗證
這里我將自己訓練的模型制作成了服務部署到阿里雲上,這個地址暫時會開發一段時間,過段時間會關閉。
自己的服務測試地址url:http://47.107.92.103:11000/yzmDetect
請求類型:json格式
請求參數:
json_data={
'img_str':img_str,
'yzm_lx':'驗證碼類型'
}
參數說明如下:img_str為驗證碼圖片的base64格式字符串,yzm_lx為所需識別的驗證碼類型,例如黑色字符為00、紅色字符為01、黃色字符為02、藍色字符為03.
如下是使用python識別本地驗證碼圖片的代碼:
```python
import base64
import requests
if __name__=='__main__':
#讀取圖片,將圖片轉換為base64編碼
img_str=None
with open('./chinatax/00/1 (1).png','rb') as f:
#直接讀取圖片是一種Unicode字節編碼 b
img_content=f.read()
#將Unicode字節編碼轉為base64字節編碼 b
img_base64=base64.b64encode(img_content)
#將base64字節轉換為字符串類型
img_str=img_base64.decode()
if img_str!=None:
json_data={
'img_str':img_str,
'yzm_lx':'00'
}
yzm_ret=requests.post(url='http://47.107.92.103:11000/yzmDetect',json=json_data)
print(yzm_ret.text)
```
識別黑色字符驗證碼:
識別結果:
識別紅色字符驗證碼:
可以看到,都可以識別對,黃色和藍色字符驗證碼的識別就不羅列了。有技術討論的可以私聊。