基于滑动窗口和全卷积网络的人脸检测

本文转载自查看原文 2019-08-12 22:40 400

由于本人深度学习环境安装在windows上，因此下面是在windows系统上实现的。仅供自己学习记录。

使用caffe训练模型，首先需要准备数据。

正样本：对于人脸检测项目，正样本就是人脸的图片。制作正样本需要将人脸从图片中裁剪出来（数据源已经标注出人脸在图片中的坐标）。裁剪完成之后，需要check一下数据是否制作的没问题。

负样本：随机进行裁剪，使用IOU确定是正样本还是负样本。比如：IOU<0.3为负样本，最好是拿没有人脸的图片。

1、caffe数据源准备:

caffe支持LMDB数据，在训练模型时首先需要将训练集、验证集转换成LMDB数据。

首先需要准备两个txt文件：train.txt和test.txt。格式如下：

/path/to/folder/image_x.jpg 0 （即图片样本所在的路径和标签。文本后面的标签，对于二分类时，为0和1。本例中，0表示人脸数据，1表示非人脸数据。）

可以使用脚本来获取txt文档。简单写个脚本（获取train.txt）如下：

(txt文档中应该只需要相对路径，如train.txt的格式如下：xxxx.jpg label ，下面的代码有点问题)——2019年7月28日更新

import os

full_train_path = r"C:\Users\Administrator\Desktop\FaceDetection\train.txt"
full_val_path = r"C:\Users\Administrator\Desktop\FaceDetection\val.txt"

train_txt = open(full_train_path, 'w')
val_txt = open(full_val_path, 'w')

# get train.txt
for file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train"):
    for figure in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train\\" + file):
        train_txt.writelines(file + r"/" + figure + " " + file + "\r\n")

train_txt.close()

#get val.txt
for val_file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\val"):
    if val_file.find("faceimage") != -1:
        val_txt.writelines(val_file + " " + "0" + "\r\n")
    else:
        val_txt.writelines(val_file + " " + "1" + "\r\n")

val_txt.close()

2、制作LMDB数据源：

分类问题使用LMDB数据，回归问题使用HDF5数据。

使用caffe自带的脚本文件，制作LMDB数据源。

convert_imageset使用格式：

convert_imageset --参数(如：resize、shuffle等) 数据源路径 数据源的txt 需要输出的lmdb路径。

cd C:\Program Files\caffe-windows\scripts\build\tools\Release
convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\train\ C:\Users\Administrator\Desktop\FaceDetection\train\train\train.txt C:\Users\Administrator\Desktop\FaceDetection\train_lmdb
convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\val\ C:\Users\Administrator\Desktop\FaceDetection\train\val\val.txt C:\Users\Administrator\Desktop\FaceDetection\val_lmdb

3、训练ALEXNET网络：

3.1配置caffe文件：

1、train.prototxt

配置caffe格式的ALEXNET网络结构。

2、solver.prototxt

①net：指定网络配置文件路径：

②test_iter：设置一次测试需要测试的batch数。最好是test_iter * batch_size = 样本总个数。

③base_lr：基础学习率。最终总的学习率为base_lr * lr_mult（train.prototxt中每一层指定的）。学习率不能太大，太大会

注：windows版本中，配置文件的路径使用“/”，如：source: "C:/Users/Administrator/Desktop/FaceDetection/train_lmdb"

3.2编写脚本，训练模型，得到模型（_iter_36000.caffemodel）：

cd C:\Program Files\caffe-windows\scripts\build\tools\Release
caffe.exe train --solver=C:\Users\Administrator\Desktop\FaceDetection\solver.prototxt

训练过程如下：

4、人脸识别检测算法框架：

4.1滑动窗口：

对输入的图片，画出不同的227*227的窗口（到目前为止，只支持固定大小的图片---卷积神经网络，最后的全连接层，参数固定。后面讲到全卷积网络，可以输入任意大小图片）。

为了检测不同尺寸图片中的人脸。需要进行多尺度scale变换。

FCN全卷积网络。得到heatmap，heatmap每一个点，代表了原图的每一个区域，其值为该区域是人脸的概率值。通过前向传播forward_all() ，得到heatmap。

设置阈值α，比如当α＞0.9时，保存框。这样的结果可能得到多个框。可以使用NMS（非极大值抑制）得到最终的一个框。

4.2将训练时的全连接的Alexnet网络进行转换成全卷积网络fcn的模型：

可以根据caffe官网示例操作：https://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

首先需要将原先全连接网络的deploy.prototxt文件中的全连接层（InnerProduct）转换层卷积网络层（Convolution），并计算设定kernel size。

然后使用以下代码转换成全卷积网络fcn的模型（full_conv.caffemodel）

    net = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy.prototxt",
                    r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel",
                    caffe.TEST)
    params = ['fc6', 'fc7', 'fc8_flickr']

    fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}

    for fc in params:
        print("{} weights are {} dimensional and biases are {} dimensional".format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape))

    net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt",
                               r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel",
                               caffe.TEST)
    params_fully_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']

    conv_params = {pr:(net_fully_conv.params[pr][0].data, net_fully_conv.params[pr][1].data) for pr in params_fully_conv}
    for conv in params_fully_conv:
        print("{} weights are {} dimensional and biases are {} dimensional".format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape))

    for pr, pr_conv in zip(params, params_fully_conv):
        conv_params[pr_conv][0].flat = fc_params[pr][0].flat
        conv_params[pr_conv][1][...] = fc_params[pr][1]
    net_fully_conv.save(r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel")

4.3使用训练好的模型，编码实现人脸检测：

import os
import sys
import numpy as np
import math
import cv2
import random

caffe_root = r"C:\Program Files\caffe-windows\python"
sys.path.insert(0, caffe_root + 'python')
os.environ['GLOG_minloglevel'] = '2'
import caffe

class Point(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y

class Rect(object):
    def __init__(self, p1, p2):
        """Store the top, bottom, left, right values for points
        p1, p2 are the left-top and right-bottom points of the rectangle"""
        self.left = min(p1.x, p2.x)
        self.right = max(p1.x, p2.x)
        self.bottom = min(p1.y, p2.y)
        self.top = max(p1.y, p2.y)

    def __str__(self):
        return "Rect[%d, %d, %d, %d]" %(self.left, self.top, self.right, self.bottom)

def calcDistance(x1, y1, x2, y2):
    dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)
    return dist

def range_overlap(a_min, a_max, b_min, b_max):
    """Judge whether there is intersection on one dimension"""
    return (a_min <= b_max) and (a_max >= b_min)

def rect_overlaps(r1, r2):
    """Judge whether the two rectangles have intersection"""
    return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top)

def rect_merge(r1, r2, mergeThresh):
    """Calculate the merge area of two rectangles"""
    if rect_overlaps(r1, r2):
        SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(min(r1.top, r2.top) - max(r1.bottom, r2.bottom))
        SA = abs(r1.right - r1.left) * abs(r1.top - r1.bottom)
        SB = abs(r2.right - r2.left) * abs(r2.top - r2.bottom)
        S = SA + SB - SI

        ratio = float(SI) / float(S)

        if ratio > mergeThresh:
            return 1
    return 0

def generateBoundingBox(featureMap, scale):
    boundingBox = []
    """We can calculate the stride from the architecture of the alexnet"""
    stride = 32
    """We need to get the boundingbox whose size is 227 * 227. When we trained the alexnet,
    we also resize the size of the input image to 227 * 227 in caffe"""
    cellSize = 227

    for (x, y), prob in np.ndenumerate(featureMap):
        if(prob >= 0.50):
            """Get the bounding box: we record the left-bottom and right-top coordinates"""
            boundingBox.append([float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale,
                               float(stride * x + cellSize - 1) / scale, prob])
    return boundingBox

def nms_average(boxes, groupThresh = 2, overlapThresh=0.2):
    rects = []

    for i in range(len(boxes)):
        if boxes[i][4] > 0.2:
            """The box in here, we record the left-bottom coordinates(y, x) and the height and width"""
            rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]])

    rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh)

    rectangles = []
    for i in range(len(rects)):
        testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3]))
        rectangles.append(testRect)
    clusters = []
    for rect in rectangles:
        matched = 0
        for cluster in clusters:
            if (rect_merge(rect, cluster, 0.2)):
                matched = 1
                cluster.left = (cluster.left + rect.left) / 2
                cluster.right = (cluster.right + rect.right) / 2
                cluster.bottom = (cluster.bottom + rect.bottom) / 2
                cluster.top = (cluster.top + rect.top) / 2
        if (not matched):
            clusters.append(rect)

    result_boxes = []
    for i in range(len(clusters)):
        result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1])

    return result_boxes

def face_detection(imgFlie):
    net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt",
                               r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel",
                               caffe.TEST)

    scales = []
    factor = 0.793700526

    img = cv2.imread(imgFlie)
    print(img.shape)

    largest = min(2, 4000 / max(img.shape[0:2]))
    scale = largest
    minD = largest * min(img.shape[0:2])
    while minD >= 227:
        scales.append(scale)
        scale *= factor
        minD *= factor
    total_boxes = []

    for scale in scales:
        scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale)))
        cv2.imwrite(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg", scale_img)
        im = caffe.io.load_image(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg")

        """Change the test input data size of the scaled image size """
        net_fully_conv.blobs['data'].reshape(1, 3, scale_img.shape[1], scale_img.shape[0])
        transformer = caffe.io.Transformer({'data': net_fully_conv.blobs['data'].data.shape})
        transformer.set_transpose('data', (2, 0, 1))
        transformer.set_channel_swap('data', (2, 1, 0))
        transformer.set_raw_scale('data', 255.0)

        out = net_fully_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)]))
        print(out['prob'][0, 1].shape)

        boxes = generateBoundingBox(out['prob'][0, 1], scale)

        if (boxes):
            total_boxes.extend(boxes)
    print(total_boxes)
    boxes_nms = np.array(total_boxes)
    true_boxes = nms_average(boxes_nms, 1, 0.2)

    if (not true_boxes == []):
        (x1, y1, x2, y2) = true_boxes[0][:-1]
        cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0))
        win = cv2.namedWindow('face detection', flags=0)
        cv2.imshow('face detection', img)
        cv2.waitKey(0)

if __name__ == "__main__":
    img = r"C:\Users\Administrator\Desktop\FaceDetection\tmp9055.jpg"
    face_detection(img)

因为电脑配置实在是太低了，所以训练了好久，电脑开着跑了好几天，也没有训练很多次。所以模型训练的不是很好。本例中，经过调参，发现生成boundingbox时，prob设置为大于等于0.5得到的结果较好。结果如下：

另外，也是用过tensorflow写过训练代码，但是由于电脑太差，训练速度太慢、精度太差。待以后慢慢再进一步学习。

注：本人正在学习AI相关知识，本例只是通过视频学习加上自己动手操作实现人脸检测功能，仅供自己学习记录。

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 【53】目标检测之卷积网络滑动窗口实现 [DeeplearningAI笔记]卷积神经网络3.1-3.5目标定位/特征点检测/目标检测/滑动窗口的卷积神经网络实现/YOLO算法基于AdaBoost的人脸检测 iOS 基于CIDetector的人脸检测基于MTCNN算法的人脸检测基于Opencv的人脸检测及识别 R-FCN：基于区域的全卷积网络来检测物体深度学习项目——基于卷积神经网络（CNN）的人脸在线识别系统基于卷积神经网络的人脸识别项目_使用Tensorflow-gpu+dilib+sklearn 滑动窗口的卷积实现的理解