tensorflow訓練驗證碼識別模型


tensorflow訓練驗證碼識別模型的樣本可以使用captcha生成,captcha在linux中的安裝也很簡單:

pip install captcha


生成驗證碼:

# -*- coding: utf-8 -*-
from captcha.image import ImageCaptcha  # pip install captcha
import numpy as np
from PIL import Image
import random
import cv2
import os

# 驗證碼中的字符
number = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

# alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
#             'v', 'w', 'x', 'y', 'z']
# ALPHABET = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U',
#             'V', 'W', 'X', 'Y', 'Z']

# 驗證碼長度為4個字符
def random_captcha_text(char_set=number, captcha_size=4):
    captcha_text = []
    for i in range(captcha_size):
        c = random.choice(char_set)
        captcha_text.append(c)
    return captcha_text


# 生成字符對應的驗證碼
def gen_captcha_text_and_image():
    image = ImageCaptcha()

    captcha_text = random_captcha_text()
    captcha_text = ''.join(captcha_text)

    captcha = image.generate(captcha_text)

    captcha_image = Image.open(captcha)
    captcha_image = np.array(captcha_image)
    return captcha_text, captcha_image


if __name__ == '__main__':
    #保存路徑
    path = './trainImage'
    # path = './validImage'
    for i in range(10000):
        text, image = gen_captcha_text_and_image()
        fullPath = os.path.join(path, text + ".jpg")
        cv2.imwrite(fullPath, image)
        print "{0}/10000".format(i)
    print "/nDone!"


分別生成訓練樣本和測試樣本,生成的樣本圖片如下:




使用tensorflow執行訓練:

# -*- coding: utf-8 -*-
import numpy as np
import tensorflow as tf
import cv2
import os
import random
import time

number = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
# alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
#             'v', 'w', 'x', 'y', 'z']
# ALPHABET = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U',
#             'V', 'W', 'X', 'Y', 'Z']

image_filename_list = []
total = 0


def get_image_file_name(imgFilePath):
    fileName = []
    total = 0
    for filePath in os.listdir(imgFilePath):
        captcha_name = filePath.split('/')[-1]
        fileName.append(captcha_name)
        total += 1
    return fileName, total


image_filename_list, total = get_image_file_name('./trainImage')
random.seed(time.time())
# 打亂順序
random.shuffle(image_filename_list)


def gen_captcha_text_and_image(imageFilePath, imageAmount):
    num = random.randint(0, imageAmount - 1)
    img = cv2.imread(os.path.join(imageFilePath, image_filename_list[num]), 0)
    img = np.float32(img)
    text = image_filename_list[num].split('.')[0]
    return text, img

# 圖像大小
IMAGE_HEIGHT = 60
IMAGE_WIDTH = 160
MAX_CAPTCHA = 4

# 文本轉向量
char_set = number
CHAR_SET_LEN = len(char_set)


# 例如,如果驗證碼是 ‘0296’ ,則對應的標簽是
# [1 0 0 0 0 0 0 0 0 0
#  0 0 1 0 0 0 0 0 0 0
#  0 0 0 0 0 0 0 0 0 1
#  0 0 0 0 0 0 1 0 0 0]
def name2label(name):
    label = np.zeros(MAX_CAPTCHA * CHAR_SET_LEN)
    for i, c in enumerate(name):
        idx = i * CHAR_SET_LEN + ord(c) - ord('0')
        label[idx] = 1
    return label


# label to name
def label2name(digitalStr):
    digitalList = []
    for c in digitalStr:
        digitalList.append(ord(c) - ord('0'))
    return np.array(digitalList)


# 文本轉向量
def text2vec(text):
    text_len = len(text)
    if text_len > MAX_CAPTCHA:
        raise ValueError('驗證碼最長4個字符')

    vector = np.zeros(MAX_CAPTCHA * CHAR_SET_LEN)

    def char2pos(c):
        if c == '_':
            k = 62
            return k
        k = ord(c) - 48
        if k > 9:
            k = ord(c) - 55
            if k > 35:
                k = ord(c) - 61
                if k > 61:
                    raise ValueError('No Map')
        return k

    for i, c in enumerate(text):
        idx = i * CHAR_SET_LEN + char2pos(c)
        vector[idx] = 1
    return vector


# 向量轉回文本
def vec2text(vec):
    char_pos = vec.nonzero()[0]
    text = []
    for i, c in enumerate(char_pos):
        char_at_pos = i  # c/63
        char_idx = c % CHAR_SET_LEN
        if char_idx < 10:
            char_code = char_idx + ord('0')
        elif char_idx < 36:
            char_code = char_idx - 10 + ord('A')
        elif char_idx < 62:
            char_code = char_idx - 36 + ord('a')
        elif char_idx == 62:
            char_code = ord('_')
        else:
            raise ValueError('error')
        text.append(chr(char_code))
    return "".join(text)


# 生成一個訓練batch
def get_next_batch(imageFilePath, batch_size=128):
    batch_x = np.zeros([batch_size, IMAGE_HEIGHT * IMAGE_WIDTH])
    batch_y = np.zeros([batch_size, MAX_CAPTCHA * CHAR_SET_LEN])

    def wrap_gen_captcha_text_and_image(imageFilePath, imageAmount):
        while True:
            text, image = gen_captcha_text_and_image(imageFilePath, imageAmount)
            if image.shape == (60, 160):
                return text, image

    for listNum in os.walk(imageFilePath):
        pass
    imageAmount = len(listNum[2])

    for i in range(batch_size):
        text, image = wrap_gen_captcha_text_and_image(imageFilePath, imageAmount)

        batch_x[i, :] = image.flatten() / 255  # (image.flatten()-128)/128  mean為0
        batch_y[i, :] = text2vec(text)

    return batch_x, batch_y


####################################################################

X = tf.placeholder(tf.float32, [None, IMAGE_HEIGHT * IMAGE_WIDTH])
Y = tf.placeholder(tf.float32, [None, MAX_CAPTCHA * CHAR_SET_LEN])
keep_prob = tf.placeholder(tf.float32)  # dropout


# 定義CNN
def crack_captcha_cnn(w_alpha=0.01, b_alpha=0.1):
    x = tf.reshape(X, shape=[-1, IMAGE_HEIGHT, IMAGE_WIDTH, 1])

    # 3 conv layer
    w_c1 = tf.Variable(w_alpha * tf.random_normal([3, 3, 1, 32]))
    b_c1 = tf.Variable(b_alpha * tf.random_normal([32]))
    conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, w_c1, strides=[1, 1, 1, 1], padding='SAME'), b_c1))
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv1 = tf.nn.dropout(conv1, keep_prob)

    w_c2 = tf.Variable(w_alpha * tf.random_normal([3, 3, 32, 64]))
    b_c2 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv1, w_c2, strides=[1, 1, 1, 1], padding='SAME'), b_c2))
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv2 = tf.nn.dropout(conv2, keep_prob)

    w_c3 = tf.Variable(w_alpha * tf.random_normal([3, 3, 64, 64]))
    b_c3 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv2, w_c3, strides=[1, 1, 1, 1], padding='SAME'), b_c3))
    conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv3 = tf.nn.dropout(conv3, keep_prob)

    # Fully connected layer
    w_d = tf.Variable(w_alpha * tf.random_normal([8 * 20 * 64, 1024]))
    b_d = tf.Variable(b_alpha * tf.random_normal([1024]))
    dense = tf.reshape(conv3, [-1, w_d.get_shape().as_list()[0]])
    dense = tf.nn.relu(tf.add(tf.matmul(dense, w_d), b_d))
    dense = tf.nn.dropout(dense, keep_prob)

    w_out = tf.Variable(w_alpha * tf.random_normal([1024, MAX_CAPTCHA * CHAR_SET_LEN]))
    b_out = tf.Variable(b_alpha * tf.random_normal([MAX_CAPTCHA * CHAR_SET_LEN]))
    out = tf.add(tf.matmul(dense, w_out), b_out)
    # out = tf.nn.softmax(out)
    return out

# 訓練
def train_crack_captcha_cnn():
    output = crack_captcha_cnn()
    # loss
    # loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(output, Y))
    loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=output, labels=Y))
    # optimizer 為了加快訓練 learning_rate應該開始大,然后慢慢減小
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)

    predict = tf.reshape(output, [-1, MAX_CAPTCHA, CHAR_SET_LEN])
    max_idx_p = tf.argmax(predict, 2)
    max_idx_l = tf.argmax(tf.reshape(Y, [-1, MAX_CAPTCHA, CHAR_SET_LEN]), 2)
    correct_pred = tf.equal(max_idx_p, max_idx_l)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    saver = tf.train.Saver()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        step = 0
        while True:
            batch_x, batch_y = get_next_batch('./trainImage', 128)
            _, loss_ = sess.run([optimizer, loss], feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.75})
            print(step, loss_)
            # 每100 step計算一次准確率
            if step % 100 == 0:
                batch_x_test, batch_y_test = get_next_batch('./validImage', 128)
                acc = sess.run(accuracy, feed_dict={X: batch_x_test, Y: batch_y_test, keep_prob: 1.})
                print(step, acc)

                # 訓練結束條件
                if acc > 0.94 or step > 3000:
                    saver.save(sess, "./crack_capcha.model", global_step=step)
                    break
            step += 1


def predict_captcha(captcha_image):
    output = crack_captcha_cnn()

    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, tf.train.latest_checkpoint('.'))

        predict = tf.argmax(tf.reshape(output, [-1, MAX_CAPTCHA, CHAR_SET_LEN]), 2)
        text_list = sess.run(predict, feed_dict={X: [captcha_image], keep_prob: 1})

        text = text_list[0].tolist()
        vector = np.zeros(MAX_CAPTCHA * CHAR_SET_LEN)
        i = 0
        for n in text:
            vector[i * CHAR_SET_LEN + n] = 1
            i += 1
        return vec2text(vector)

# 執行訓練
train_crack_captcha_cnn()
print "訓練完成,開始測試…"
time.sleep(3000)

# -------------------------------------------------------------------


大約執行1600輪迭代(batchsize=128)之后訓練完成:



訓練結果在當前目錄文件夾下生成4個文件:




測試單張驗證碼圖片:

# -*- coding: utf-8 -*-
import numpy as np
import tensorflow as tf
import cv2
import os
import random
import time

number = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
# 圖像大小
IMAGE_HEIGHT = 60
IMAGE_WIDTH = 160
MAX_CAPTCHA = 4
char_set = number
CHAR_SET_LEN = len(char_set)

X = tf.placeholder(tf.float32, [None, IMAGE_HEIGHT * IMAGE_WIDTH])
Y = tf.placeholder(tf.float32, [None, MAX_CAPTCHA * CHAR_SET_LEN])
keep_prob = tf.placeholder(tf.float32)  # dropout

# 定義CNN
def crack_captcha_cnn(w_alpha=0.01, b_alpha=0.1):
    x = tf.reshape(X, shape=[-1, IMAGE_HEIGHT, IMAGE_WIDTH, 1])

    # 3 conv layer
    w_c1 = tf.Variable(w_alpha * tf.random_normal([3, 3, 1, 32]))
    b_c1 = tf.Variable(b_alpha * tf.random_normal([32]))
    conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, w_c1, strides=[1, 1, 1, 1], padding='SAME'), b_c1))
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv1 = tf.nn.dropout(conv1, keep_prob)

    w_c2 = tf.Variable(w_alpha * tf.random_normal([3, 3, 32, 64]))
    b_c2 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv1, w_c2, strides=[1, 1, 1, 1], padding='SAME'), b_c2))
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv2 = tf.nn.dropout(conv2, keep_prob)

    w_c3 = tf.Variable(w_alpha * tf.random_normal([3, 3, 64, 64]))
    b_c3 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv2, w_c3, strides=[1, 1, 1, 1], padding='SAME'), b_c3))
    conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv3 = tf.nn.dropout(conv3, keep_prob)

    # Fully connected layer
    w_d = tf.Variable(w_alpha * tf.random_normal([8 * 20 * 64, 1024]))
    b_d = tf.Variable(b_alpha * tf.random_normal([1024]))
    dense = tf.reshape(conv3, [-1, w_d.get_shape().as_list()[0]])
    dense = tf.nn.relu(tf.add(tf.matmul(dense, w_d), b_d))
    dense = tf.nn.dropout(dense, keep_prob)

    w_out = tf.Variable(w_alpha * tf.random_normal([1024, MAX_CAPTCHA * CHAR_SET_LEN]))
    b_out = tf.Variable(b_alpha * tf.random_normal([MAX_CAPTCHA * CHAR_SET_LEN]))
    out = tf.add(tf.matmul(dense, w_out), b_out)
    # out = tf.nn.softmax(out)
    return out

# 向量轉回文本
def vec2text(vec):
    char_pos = vec.nonzero()[0]
    text = []
    for i, c in enumerate(char_pos):
        char_at_pos = i  # c/63
        char_idx = c % CHAR_SET_LEN
        if char_idx < 10:
            char_code = char_idx + ord('0')
        elif char_idx < 36:
            char_code = char_idx - 10 + ord('A')
        elif char_idx < 62:
            char_code = char_idx - 36 + ord('a')
        elif char_idx == 62:
            char_code = ord('_')
        else:
            raise ValueError('error')
        text.append(chr(char_code))
    return "".join(text)

def predict_captcha(captcha_image):
    output = crack_captcha_cnn()

    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, tf.train.latest_checkpoint('.'))

        predict = tf.argmax(tf.reshape(output, [-1, MAX_CAPTCHA, CHAR_SET_LEN]), 2)
        text_list = sess.run(predict, feed_dict={X: [captcha_image], keep_prob: 1})

        text = text_list[0].tolist()
        vector = np.zeros(MAX_CAPTCHA * CHAR_SET_LEN)
        i = 0
        for n in text:
            vector[i * CHAR_SET_LEN + n] = 1
            i += 1
        return vec2text(vector)

#單張圖片預測
image = np.float32(cv2.imread('./validImage/2792.jpg', 0))
text = '2792'
image = image.flatten() / 255
predict_text = predict_captcha(image)
print("正確: {0}  預測: {1}".format(text, predict_text))


由於captcha生成的驗證碼條件相對單一,使用訓練出來的模型即便只有0.94的精度也比人工識別的精度要高了。預測結果正確:



識別過程中加載測試圖片注意進行精度轉換(np.float32())。

這里可以下載訓練好的模型文件: http://download.csdn.net/download/dcrmg/10195217



20180114補充: 訓練代碼詳細解讀

# -*- coding: utf-8 -*-
import numpy as np
import tensorflow as tf
import cv2
import os
import random
import time

#number
number = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

# 圖像大小
IMAGE_HEIGHT = 60  #80
IMAGE_WIDTH = 160  #250
MAX_CAPTCHA = 8

char_set = number
CHAR_SET_LEN = len(char_set)  #

image_filename_list = []
total = 0

def get_image_file_name(imgFilePath):
    fileName = []
    total = 0
    for filePath in os.listdir(imgFilePath):
        captcha_name = filePath.split('/')[-1]
        fileName.append(captcha_name)
        total += 1
    random.seed(time.time())
    # 打亂順序
    random.shuffle(fileName)
    return fileName, total

# 獲取訓練數據的名稱列表
image_filename_list, total = get_image_file_name('./trainImage')
# 獲取測試數據的名稱列表
image_filename_list_valid, total = get_image_file_name('./validImage')

# 讀取圖片和標簽
def gen_captcha_text_and_image(imageFilePath, image_filename_list,imageAmount):
    num = random.randint(0, imageAmount - 1)
    img = cv2.imread(os.path.join(imageFilePath, image_filename_list[num]), 0)
    img = cv2.resize(img,(160,60))
    img = np.float32(img)
    text = image_filename_list[num].split('.')[0]
    return text, img

# 文本轉向量
# 例如,如果驗證碼是 ‘0296’ ,則對應的標簽是
# [1 0 0 0 0 0 0 0 0 0
#  0 0 1 0 0 0 0 0 0 0
#  0 0 0 0 0 0 0 0 0 1
#  0 0 0 0 0 0 1 0 0 0]
def name2label(name):
    label = np.zeros(MAX_CAPTCHA * CHAR_SET_LEN)
    for i, c in enumerate(name):
        idx = i * CHAR_SET_LEN + ord(c) - ord('0')
        label[idx] = 1
    return label

# label to name
def label2name(digitalStr):
    digitalList = []
    for c in digitalStr:
        digitalList.append(ord(c) - ord('0'))
    return np.array(digitalList)

# 文本轉向量
def text2vec(text):
    text_len = len(text)
    if text_len > MAX_CAPTCHA:
        raise ValueError('驗證碼最長4個字符')

    vector = np.zeros(MAX_CAPTCHA * CHAR_SET_LEN)

    def char2pos(c):
        if c == '_':
            k = 62
            return k
        k = ord(c) - 48
        if k > 9:
            k = ord(c) - 55
            if k > 35:
                k = ord(c) - 61
                if k > 61:
                    raise ValueError('No Map')
        return k

    for i, c in enumerate(text):
        idx = i * CHAR_SET_LEN + char2pos(c)
        vector[idx] = 1
    return vector

# 向量轉回文本
def vec2text(vec):
    char_pos = vec.nonzero()[0]
    text = []
    for i, c in enumerate(char_pos):
        char_at_pos = i  # c/63
        char_idx = c % CHAR_SET_LEN
        if char_idx < 10:
            char_code = char_idx + ord('0')
        elif char_idx < 36:
            char_code = char_idx - 10 + ord('A')
        elif char_idx < 62:
            char_code = char_idx - 36 + ord('a')
        elif char_idx == 62:
            char_code = ord('_')
        else:
            raise ValueError('error')
        text.append(chr(char_code))
    return "".join(text)

# 生成一個訓練batch
def get_next_batch(imageFilePath, image_filename_list= None,batch_size=128):
    batch_x = np.zeros([batch_size, IMAGE_HEIGHT * IMAGE_WIDTH])
    batch_y = np.zeros([batch_size, MAX_CAPTCHA * CHAR_SET_LEN])

    def wrap_gen_captcha_text_and_image(imageFilePath, imageAmount):
        while True:
            text, image = gen_captcha_text_and_image(imageFilePath,image_filename_list, imageAmount)
            if image.shape == (60, 160):
                return text, image

    for listNum in os.walk(imageFilePath):
        pass
    imageAmount = len(listNum[2])

    for i in range(batch_size):
        text, image = wrap_gen_captcha_text_and_image(imageFilePath, imageAmount)

        batch_x[i, :] = image.flatten() / 255  # (image.flatten()-128)/128  mean為0
        batch_y[i, :] = text2vec(text)

    return batch_x, batch_y

####################################################################
# 占位符,X和Y分別是輸入訓練數據和其標簽,標簽轉換成8*10的向量
X = tf.placeholder(tf.float32, [None, IMAGE_HEIGHT * IMAGE_WIDTH])
Y = tf.placeholder(tf.float32, [None, MAX_CAPTCHA * CHAR_SET_LEN])
# 聲明dropout占位符變量
keep_prob = tf.placeholder(tf.float32)  # dropout

# 定義CNN
def crack_captcha_cnn(w_alpha=0.01, b_alpha=0.1):

    # 把 X reshape 成 IMAGE_HEIGHT*IMAGE_WIDTH*1的格式,輸入的是灰度圖片,所有通道數是1;
    # shape 里的-1表示數量不定,根據實際情況獲取,這里為每輪迭代輸入的圖像數量(batchsize)的大小;
    x = tf.reshape(X, shape=[-1, IMAGE_HEIGHT, IMAGE_WIDTH, 1])

    # 搭建第一層卷積層
    # shape[3, 3, 1, 32]里前兩個參數表示卷積核尺寸大小,即patch;
    # 第三個參數是圖像通道數,第四個參數是該層卷積核的數量,有多少個卷積核就會輸出多少個卷積特征圖像
    w_c1 = tf.Variable(w_alpha * tf.random_normal([3, 3, 1, 32]))
    # 每個卷積核都配置一個偏置量,該層有多少個輸出,就應該配置多少個偏置量
    b_c1 = tf.Variable(b_alpha * tf.random_normal([32]))
    # 圖片和卷積核卷積,並加上偏執量,卷積結果28x28x32
    # tf.nn.conv2d() 函數實現卷積操作
    # tf.nn.conv2d()中的padding用於設置卷積操作對邊緣像素的處理方式,在tf中有VALID和SAME兩種模式
    # padding='SAME'會對圖像邊緣補0,完成圖像上所有像素(特別是邊緣象素)的卷積操作
    # padding='VALID'會直接丟棄掉圖像邊緣上不夠卷積的像素
    # strides:卷積時在圖像每一維的步長,是一個一維的向量,長度4,並且strides[0]=strides[3]=1
    # tf.nn.bias_add() 函數的作用是將偏置項b_c1加到卷積結果value上去;
    # 注意這里的偏置項b_c1必須是一維的,並且數量一定要與卷積結果value最后一維數量相同
    # tf.nn.relu() 函數是relu激活函數,實現輸出結果的非線性轉換,即features=max(features, 0),輸出tensor的形狀和輸入一致
    conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, w_c1, strides=[1, 1, 1, 1], padding='SAME'), b_c1))
    # tf.nn.max_pool()函數實現最大池化操作,進一步提取圖像的抽象特征,並且降低特征維度
    # ksize=[1, 2, 2, 1]定義最大池化操作的核尺寸為2*2, 池化結果14x14x32 卷積結果乘以池化卷積核
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    # tf.nn.dropout是tf里為了防止或減輕過擬合而使用的函數,一般用在全連接層;
    # Dropout機制就是在不同的訓練過程中根據一定概率(大小可以設置,一般情況下訓練推薦0.5)隨機扔掉(屏蔽)一部分神經元,
    # 不參與本次神經網絡迭代的計算(優化)過程,權重保留但不做更新;
    # tf.nn.dropout()中 keep_prob用於設置概率,需要是一個占位變量,在執行的時候具體給定數值
    conv1 = tf.nn.dropout(conv1, keep_prob)
    # 原圖像HEIGHT = 60 WIDTH = 160,經過神經網絡第一層卷積(圖像尺寸不變、特征×32)、池化(圖像尺寸縮小一半,特征不變)之后;
    # 輸出大小為 30*80*32

    # 搭建第二層卷積層
    w_c2 = tf.Variable(w_alpha * tf.random_normal([3, 3, 32, 64]))
    b_c2 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv1, w_c2, strides=[1, 1, 1, 1], padding='SAME'), b_c2))
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv2 = tf.nn.dropout(conv2, keep_prob)
    # 原圖像HEIGHT = 60 WIDTH = 160,經過神經網絡第一層后輸出大小為 30*80*32
    # 經過神經網絡第二層運算后輸出為 16*40*64 (30*80的圖像經過2*2的卷積核池化,padding為SAME,輸出維度是16*40)

    # 搭建第三層卷積層
    w_c3 = tf.Variable(w_alpha * tf.random_normal([3, 3, 64, 64]))
    b_c3 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv2, w_c3, strides=[1, 1, 1, 1], padding='SAME'), b_c3))
    conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv3 = tf.nn.dropout(conv3, keep_prob)
    # 原圖像HEIGHT = 60 WIDTH = 160,經過神經網絡第一層后輸出大小為 30*80*32 經過第二層后輸出為 16*40*64
    # 經過神經網絡第二層運算后輸出為 16*40*64 ; 經過第三層輸出為 8*20*64,這個參數很重要,決定量后邊全連接層的維度

    # 搭建全連接層
    # 二維張量,第一個參數8*20*64的patch,這個參數由最后一層卷積層的輸出決定,第二個參數代表卷積個數共1024個,即輸出為1024個特征
    w_d = tf.Variable(w_alpha * tf.random_normal([ 8 * 20 * 64, 1024]))
    # 偏置項為1維,個數跟卷積核個數保持一致
    b_d = tf.Variable(b_alpha * tf.random_normal([1024]))
    # w_d.get_shape()作用是把張量w_d的形狀轉換為元組tuple的形式,w_d.get_shape().as_list()是把w_d轉為元組再轉為list形式
    # w_d 的 形狀是[ 8 * 20 * 64, 1024],w_d.get_shape().as_list()結果為 8*20*64=10240 ;
    # 所以tf.reshape(conv3, [-1, w_d.get_shape().as_list()[0]])的作用是把最后一層隱藏層的輸出轉換成一維的形式
    dense = tf.reshape(conv3, [-1, w_d.get_shape().as_list()[0]])
    # tf.matmul(dense, w_d)函數是矩陣相乘,輸出維度是 -1*1024
    dense = tf.nn.relu(tf.add(tf.matmul(dense, w_d), b_d))
    dense = tf.nn.dropout(dense, keep_prob)
    # 經過全連接層之后,輸出為 一維,1024個向量

    # w_out定義成一個形狀為 [1024, 8 * 10] = [1024, 80]
    w_out = tf.Variable(w_alpha * tf.random_normal([1024, MAX_CAPTCHA * CHAR_SET_LEN]))
    b_out = tf.Variable(b_alpha * tf.random_normal([MAX_CAPTCHA * CHAR_SET_LEN]))
    # out 的輸出為 8*10 的向量, 8代表識別結果的位數,10是每一位上可能的結果(0到9)
    out = tf.add(tf.matmul(dense, w_out), b_out)
    # out = tf.nn.softmax(out)
    # 輸出神經網絡在當前參數下的預測值
    return out

# 訓練
def train_crack_captcha_cnn():
    output = crack_captcha_cnn()
    # loss
    # loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(output, Y))
    # tf.nn.sigmoid_cross_entropy_with_logits()函數計算交叉熵,輸出的是一個向量而不是數;
    # 交叉熵刻畫的是實際輸出(概率)與期望輸出(概率)的距離,也就是交叉熵的值越小,兩個概率分布就越接近
    # tf.reduce_mean()函數求矩陣的均值
    loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=output, labels=Y))
    # optimizer 為了加快訓練 learning_rate應該開始大,然后慢慢減小
    # tf.train.AdamOptimizer()函數實現了Adam算法的優化器
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)

    predict = tf.reshape(output, [-1, MAX_CAPTCHA, CHAR_SET_LEN])
    max_idx_p = tf.argmax(predict, 2)
    max_idx_l = tf.argmax(tf.reshape(Y, [-1, MAX_CAPTCHA, CHAR_SET_LEN]), 2)
    correct_pred = tf.equal(max_idx_p, max_idx_l)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    saver = tf.train.Saver()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        step = 0
        while True:
            batch_x, batch_y = get_next_batch('./trainImage',image_filename_list, 64)
            _, loss_ = sess.run([optimizer, loss], feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.75})
            print(step, loss_)
            # 每100 step計算一次准確率
            if step % 100 == 0:
                batch_x_test, batch_y_test = get_next_batch('./vaildImage',image_filename_list_valid, 128)
                acc = sess.run(accuracy, feed_dict={X: batch_x_test, Y: batch_y_test, keep_prob: 1.})
                print(step, acc)

                # 訓練結束條件
                if acc > 0.97 or step > 5500:
                    saver.save(sess, "./crack_capcha.model", global_step=step)
                    break
            step += 1


def predict_captcha(captcha_image):
    output = crack_captcha_cnn()

    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, tf.train.latest_checkpoint('.'))

        predict = tf.argmax(tf.reshape(output, [-1, MAX_CAPTCHA, CHAR_SET_LEN]), 2)
        text_list = sess.run(predict, feed_dict={X: [captcha_image], keep_prob: 1})

        text = text_list[0].tolist()
        vector = np.zeros(MAX_CAPTCHA * CHAR_SET_LEN)
        i = 0
        for n in text:
            vector[i * CHAR_SET_LEN + n] = 1
            i += 1
        return vec2text(vector)

# 執行訓練
train_crack_captcha_cnn()
print "訓練完成,開始測試…"
# time.sleep(3000)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM