基於滑動窗口和全卷積網絡的人臉檢測


由於本人深度學習環境安裝在windows上,因此下面是在windows系統上實現的。僅供自己學習記錄。

使用caffe訓練模型,首先需要准備數據。

正樣本:對於人臉檢測項目,正樣本就是人臉的圖片。制作正樣本需要將人臉從圖片中裁剪出來(數據源已經標注出人臉在圖片中的坐標)。裁剪完成之后,需要check一下數據是否制作的沒問題。

負樣本:隨機進行裁剪,使用IOU確定是正樣本還是負樣本。比如:IOU<0.3為負樣本,最好是拿沒有人臉的圖片。

1、caffe數據源准備:

caffe支持LMDB數據,在訓練模型時首先需要將訓練集、驗證集轉換成LMDB數據。

首先需要准備兩個txt文件:train.txt和test.txt。格式如下: 

/path/to/folder/image_x.jpg 0 (即圖片樣本所在的路徑和標簽。文本后面的標簽,對於二分類時,為0和1。本例中,0表示人臉數據,1表示非人臉數據。)

可以使用腳本來獲取txt文檔。簡單寫個腳本(獲取train.txt)如下:

(txt文檔中應該只需要相對路徑,如train.txt的格式如下:xxxx.jpg label ,下面的代碼有點問題)——2019年7月28日更新

import os

full_train_path = r"C:\Users\Administrator\Desktop\FaceDetection\train.txt"
full_val_path = r"C:\Users\Administrator\Desktop\FaceDetection\val.txt"

train_txt = open(full_train_path, 'w')
val_txt = open(full_val_path, 'w')

# get train.txt
for file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train"):
    for figure in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train\\" + file):
        train_txt.writelines(file + r"/" + figure + " " + file + "\r\n")

train_txt.close()

#get val.txt
for val_file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\val"):
    if val_file.find("faceimage") != -1:
        val_txt.writelines(val_file + " " + "0" + "\r\n")
    else:
        val_txt.writelines(val_file + " " + "1" + "\r\n")

val_txt.close()

 2、制作LMDB數據源:

分類問題使用LMDB數據,回歸問題使用HDF5數據。

使用caffe自帶的腳本文件,制作LMDB數據源。

convert_imageset使用格式:
convert_imageset --參數(如:resize、shuffle等) 數據源路徑 數據源的txt 需要輸出的lmdb路徑。
cd C:\Program Files\caffe-windows\scripts\build\tools\Release
convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\train\ C:\Users\Administrator\Desktop\FaceDetection\train\train\train.txt C:\Users\Administrator\Desktop\FaceDetection\train_lmdb convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\val\ C:\Users\Administrator\Desktop\FaceDetection\train\val\val.txt C:\Users\Administrator\Desktop\FaceDetection\val_lmdb

 3、訓練ALEXNET網絡:

3.1配置caffe文件:

1、train.prototxt

配置caffe格式的ALEXNET網絡結構。

2、solver.prototxt

①net:指定網絡配置文件路徑:

②test_iter:設置一次測試需要測試的batch數。最好是test_iter * batch_size = 樣本總個數。

③base_lr:基礎學習率。最終總的學習率為base_lr * lr_mult(train.prototxt中每一層指定的)。學習率不能太大,太大會

注:windows版本中,配置文件的路徑使用“/”,如:source: "C:/Users/Administrator/Desktop/FaceDetection/train_lmdb"

3.2編寫腳本,訓練模型,得到模型(_iter_36000.caffemodel):

cd C:\Program Files\caffe-windows\scripts\build\tools\Release
caffe.exe train --solver=C:\Users\Administrator\Desktop\FaceDetection\solver.prototxt

訓練過程如下:

 4、人臉識別檢測算法框架:

4.1滑動窗口:

對輸入的圖片,畫出不同的227*227的窗口(到目前為止,只支持固定大小的圖片---卷積神經網絡,最后的全連接層,參數固定。后面講到全卷積網絡,可以輸入任意大小圖片)。

為了檢測不同尺寸圖片中的人臉。需要進行多尺度scale變換。

FCN全卷積網絡。得到heatmap,heatmap每一個點,代表了原圖的每一個區域,其值為該區域是人臉的概率值。通過前向傳播forward_all() ,得到heatmap。

設置閾值α,比如當α>0.9時,保存框。這樣的結果可能得到多個框。可以使用NMS(非極大值抑制)得到最終的一個框。

4.2將訓練時的全連接的Alexnet網絡進行轉換成全卷積網絡fcn的模型:

可以根據caffe官網示例操作:https://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

首先需要將原先全連接網絡的deploy.prototxt文件中的全連接層(InnerProduct)轉換層卷積網絡層(Convolution),並計算設定kernel size。

然后使用以下代碼轉換成全卷積網絡fcn的模型(full_conv.caffemodel)

    net = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy.prototxt",
                    r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel",
                    caffe.TEST)
    params = ['fc6', 'fc7', 'fc8_flickr']

    fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}

    for fc in params:
        print("{} weights are {} dimensional and biases are {} dimensional".format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape))

    net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt",
                               r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel",
                               caffe.TEST)
    params_fully_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']

    conv_params = {pr:(net_fully_conv.params[pr][0].data, net_fully_conv.params[pr][1].data) for pr in params_fully_conv}
    for conv in params_fully_conv:
        print("{} weights are {} dimensional and biases are {} dimensional".format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape))

    for pr, pr_conv in zip(params, params_fully_conv):
        conv_params[pr_conv][0].flat = fc_params[pr][0].flat
        conv_params[pr_conv][1][...] = fc_params[pr][1]
    net_fully_conv.save(r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel")

4.3使用訓練好的模型,編碼實現人臉檢測:

import os
import sys
import numpy as np
import math
import cv2
import random

caffe_root = r"C:\Program Files\caffe-windows\python"
sys.path.insert(0, caffe_root + 'python')
os.environ['GLOG_minloglevel'] = '2'
import caffe

class Point(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y

class Rect(object):
    def __init__(self, p1, p2):
        """Store the top, bottom, left, right values for points
        p1, p2 are the left-top and right-bottom points of the rectangle"""
        self.left = min(p1.x, p2.x)
        self.right = max(p1.x, p2.x)
        self.bottom = min(p1.y, p2.y)
        self.top = max(p1.y, p2.y)

    def __str__(self):
        return "Rect[%d, %d, %d, %d]" %(self.left, self.top, self.right, self.bottom)

def calcDistance(x1, y1, x2, y2):
    dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)
    return dist

def range_overlap(a_min, a_max, b_min, b_max):
    """Judge whether there is intersection on one dimension"""
    return (a_min <= b_max) and (a_max >= b_min)

def rect_overlaps(r1, r2):
    """Judge whether the two rectangles have intersection"""
    return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top)

def rect_merge(r1, r2, mergeThresh):
    """Calculate the merge area of two rectangles"""
    if rect_overlaps(r1, r2):
        SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(min(r1.top, r2.top) - max(r1.bottom, r2.bottom))
        SA = abs(r1.right - r1.left) * abs(r1.top - r1.bottom)
        SB = abs(r2.right - r2.left) * abs(r2.top - r2.bottom)
        S = SA + SB - SI

        ratio = float(SI) / float(S)

        if ratio > mergeThresh:
            return 1
    return 0

def generateBoundingBox(featureMap, scale):
    boundingBox = []
    """We can calculate the stride from the architecture of the alexnet"""
    stride = 32
    """We need to get the boundingbox whose size is 227 * 227. When we trained the alexnet,
    we also resize the size of the input image to 227 * 227 in caffe"""
    cellSize = 227

    for (x, y), prob in np.ndenumerate(featureMap):
        if(prob >= 0.50):
            """Get the bounding box: we record the left-bottom and right-top coordinates"""
            boundingBox.append([float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale,
                               float(stride * x + cellSize - 1) / scale, prob])
    return boundingBox

def nms_average(boxes, groupThresh = 2, overlapThresh=0.2):
    rects = []

    for i in range(len(boxes)):
        if boxes[i][4] > 0.2:
            """The box in here, we record the left-bottom coordinates(y, x) and the height and width"""
            rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]])

    rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh)

    rectangles = []
    for i in range(len(rects)):
        testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3]))
        rectangles.append(testRect)
    clusters = []
    for rect in rectangles:
        matched = 0
        for cluster in clusters:
            if (rect_merge(rect, cluster, 0.2)):
                matched = 1
                cluster.left = (cluster.left + rect.left) / 2
                cluster.right = (cluster.right + rect.right) / 2
                cluster.bottom = (cluster.bottom + rect.bottom) / 2
                cluster.top = (cluster.top + rect.top) / 2
        if (not matched):
            clusters.append(rect)

    result_boxes = []
    for i in range(len(clusters)):
        result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1])

    return result_boxes

def face_detection(imgFlie):
    net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt",
                               r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel",
                               caffe.TEST)

    scales = []
    factor = 0.793700526

    img = cv2.imread(imgFlie)
    print(img.shape)

    largest = min(2, 4000 / max(img.shape[0:2]))
    scale = largest
    minD = largest * min(img.shape[0:2])
    while minD >= 227:
        scales.append(scale)
        scale *= factor
        minD *= factor
    total_boxes = []

    for scale in scales:
        scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale)))
        cv2.imwrite(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg", scale_img)
        im = caffe.io.load_image(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg")

        """Change the test input data size of the scaled image size """
        net_fully_conv.blobs['data'].reshape(1, 3, scale_img.shape[1], scale_img.shape[0])
        transformer = caffe.io.Transformer({'data': net_fully_conv.blobs['data'].data.shape})
        transformer.set_transpose('data', (2, 0, 1))
        transformer.set_channel_swap('data', (2, 1, 0))
        transformer.set_raw_scale('data', 255.0)

        out = net_fully_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)]))
        print(out['prob'][0, 1].shape)

        boxes = generateBoundingBox(out['prob'][0, 1], scale)

        if (boxes):
            total_boxes.extend(boxes)
    print(total_boxes)
    boxes_nms = np.array(total_boxes)
    true_boxes = nms_average(boxes_nms, 1, 0.2)

    if (not true_boxes == []):
        (x1, y1, x2, y2) = true_boxes[0][:-1]
        cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0))
        win = cv2.namedWindow('face detection', flags=0)
        cv2.imshow('face detection', img)
        cv2.waitKey(0)

if __name__ == "__main__":
    img = r"C:\Users\Administrator\Desktop\FaceDetection\tmp9055.jpg"
    face_detection(img)

因為電腦配置實在是太低了,所以訓練了好久,電腦開着跑了好幾天,也沒有訓練很多次。所以模型訓練的不是很好。本例中,經過調參,發現生成boundingbox時,prob設置為大於等於0.5得到的結果較好。結果如下:

另外,也是用過tensorflow寫過訓練代碼,但是由於電腦太差,訓練速度太慢、精度太差。待以后慢慢再進一步學習。

 

注:本人正在學習AI相關知識,本例只是通過視頻學習加上自己動手操作實現人臉檢測功能,僅供自己學習記錄。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM