最近个人有发票需要报销,因此需要到国家税务局全国增值税发票查验平台上进行发票查验,由于该验证码有很多干扰像素,肉眼看都容易出错,因此产生兴趣利用Keras深度学习制作一个模型用于该验证码识别。本实例中仅识别英文字符,因为该验证码中包含中文字符的数目并不是很多,因此只要能准确识别出英文字符,就可以满足识别要求,以后有时间精力的话再做一款兼容中文字符的验证码。
# 1 准备工作
工欲善其事必先利其器,做深度学习识别验证码当然少不了样本集。首先打开国税局查验官网上看看验证码长什么样。
国税局验证码分为四种:全黑字符、红色字符、黄色字符、蓝色字符。我这里采用的识别算法是每种字符都会训练一个模型,黑色字符模型专门用来识别黑色字符,红色字符模型专门用来识别红色字符,依次类推。
每种字符我们准备10万张验证码,4种验证码总共准备挨40万张,建议最好从网站上下载真实的验证码进行标记。网上有的博主采用自己造验证码,如果自己造的能够与真实验证码95%的接近就可以用自己造的。对于公司级,舍得投入的话,几十万张验证码可以花钱请标记公司来标记;对于我这个小博主,没那么多钱找人标记,又不像自己标,因此我也采用了造的验证码。
接下来说一下验证码标记,对于验证码标记,简单的做法就是重命名。对于全黑字符的标记,6个字符就是验证码的名称,如下所示:
对于其它颜色的字符验证码,例如红色字符验证码,只标记红色字符,如下所示:
按照上述方法进行标记,当把几十万张验证码标记好了,就可以进行模型的训练了。
# 2 识别模型训练
对文本识别,当然要选择CRNN+CTC的方式,这是一种端对端文本识别算法,目前除了应用于字符识别,也应用于语音识别等其它方向。
我的crnn卷积层网络如下所示:
```python
rununit=256
model=models.Sequential()
model.add(Conv2D(64,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu',input_shape=(32,None,3)))
model.add(MaxPooling2D(pool_size=(2,2),strides=None,padding='valid'))
model.add(Conv2D(128,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=None,padding='valid'))
model.add(Conv2D(256,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(256,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(ZeroPadding2D(padding=(0,1)))
model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1),padding='valid'))
model.add(Conv2D(512,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(BatchNormalization(axis=1))
model.add(Conv2D(512,kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(BatchNormalization(axis=1))
model.add(ZeroPadding2D(padding=(0,1)))
model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1),padding='valid'))
model.add(Conv2D(512,kernel_size=(2,2),strides=(1,1),padding='valid',activation='relu'))
model.add(Permute(dims=(2,1,3)))
model.add(TimeDistributed(Flatten()))
model.add(Bidirectional(GRU(rununit,return_sequences=True,kernel_initializer='he_normal')))
model.add(Dense(rununit,activation='linear'))
model.add(Bidirectional(GRU(rununit,return_sequences=True,kernel_initializer='he_normal')))
model.add(Dropout(rate=0.25))
model.add(Dense(n_class,activation='softmax',kernel_initializer='he_normal'))
```
整个Keras的ctc模型网络结构如下:
```bash
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
conv2d_input (InputLayer) [(None, 32, None, 3) 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 32, None, 64) 1792 conv2d_input[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, None, 64) 0 conv2d[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 16, None, 128 73856 max_pooling2d[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 8, None, 128) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 8, None, 256) 295168 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 8, None, 256) 590080 conv2d_2[0][0]
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D) (None, 8, None, 256) 0 conv2d_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 4, None, 256) 0 zero_padding2d[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 4, None, 512) 1180160 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 4, None, 512) 16 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 4, None, 512) 2359808 batch_normalization[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 4, None, 512) 16 conv2d_5[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 4, None, 512) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, 2, None, 512) 0 zero_padding2d_1[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 1, None, 512) 1049088 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
permute (Permute) (None, None, 1, 512) 0 conv2d_6[0][0]
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, 512) 0 permute[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional) (None, None, 512) 1182720 time_distributed[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, None, 256) 131328 bidirectional[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 512) 789504 dense[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, None, 512) 0 bidirectional_1[0][0]
__________________________________________________________________________________________________
the_labels (InputLayer) [(None, 6)] 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 37) 18981 dropout[0][0]
__________________________________________________________________________________________________
input_length (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
label_length (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
ctc_loss (Lambda) (None, 1) 0 the_labels[0][0]
dense_1[0][0]
input_length[0][0]
label_length[0][0]
==================================================================================================
Total params: 7,672,517
Trainable params: 7,672,501
Non-trainable params: 16
__________________________________________________________________________________________________
```
使用该网络训练,每种验证码训练得到一个模型,一共得到4个模型。
# 3 模型的验证
这里我将自己训练的模型制作成了服务部署到阿里云上,这个地址暂时会开发一段时间,过段时间会关闭。
自己的服务测试地址url:http://47.107.92.103:11000/yzmDetect
请求类型:json格式
请求参数:
json_data={
'img_str':img_str,
'yzm_lx':'验证码类型'
}
参数说明如下:img_str为验证码图片的base64格式字符串,yzm_lx为所需识别的验证码类型,例如黑色字符为00、红色字符为01、黄色字符为02、蓝色字符为03.
如下是使用python识别本地验证码图片的代码:
```python
import base64
import requests
if __name__=='__main__':
#读取图片,将图片转换为base64编码
img_str=None
with open('./chinatax/00/1 (1).png','rb') as f:
#直接读取图片是一种Unicode字节编码 b
img_content=f.read()
#将Unicode字节编码转为base64字节编码 b
img_base64=base64.b64encode(img_content)
#将base64字节转换为字符串类型
img_str=img_base64.decode()
if img_str!=None:
json_data={
'img_str':img_str,
'yzm_lx':'00'
}
yzm_ret=requests.post(url='http://47.107.92.103:11000/yzmDetect',json=json_data)
print(yzm_ret.text)
```
识别黑色字符验证码:
识别结果:
识别红色字符验证码:
可以看到,都可以识别对,黄色和蓝色字符验证码的识别就不罗列了。有技术讨论的可以私聊。