由于本人深度学习环境安装在windows上,因此下面是在windows系统上实现的。仅供自己学习记录。
使用caffe训练模型,首先需要准备数据。
正样本:对于人脸检测项目,正样本就是人脸的图片。制作正样本需要将人脸从图片中裁剪出来(数据源已经标注出人脸在图片中的坐标)。裁剪完成之后,需要check一下数据是否制作的没问题。
负样本:随机进行裁剪,使用IOU确定是正样本还是负样本。比如:IOU<0.3为负样本,最好是拿没有人脸的图片。
1、caffe数据源准备:
caffe支持LMDB数据,在训练模型时首先需要将训练集、验证集转换成LMDB数据。
首先需要准备两个txt文件:train.txt和test.txt。格式如下:
/path/to/folder/image_x.jpg 0 (即图片样本所在的路径和标签。文本后面的标签,对于二分类时,为0和1。本例中,0表示人脸数据,1表示非人脸数据。)
可以使用脚本来获取txt文档。简单写个脚本(获取train.txt)如下:
(txt文档中应该只需要相对路径,如train.txt的格式如下:xxxx.jpg label ,下面的代码有点问题)——2019年7月28日更新
import os full_train_path = r"C:\Users\Administrator\Desktop\FaceDetection\train.txt" full_val_path = r"C:\Users\Administrator\Desktop\FaceDetection\val.txt" train_txt = open(full_train_path, 'w') val_txt = open(full_val_path, 'w') # get train.txt for file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train"): for figure in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train\\" + file): train_txt.writelines(file + r"/" + figure + " " + file + "\r\n") train_txt.close() #get val.txt for val_file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\val"): if val_file.find("faceimage") != -1: val_txt.writelines(val_file + " " + "0" + "\r\n") else: val_txt.writelines(val_file + " " + "1" + "\r\n") val_txt.close()
2、制作LMDB数据源:
分类问题使用LMDB数据,回归问题使用HDF5数据。
使用caffe自带的脚本文件,制作LMDB数据源。
convert_imageset使用格式:
convert_imageset --参数(如:resize、shuffle等) 数据源路径 数据源的txt 需要输出的lmdb路径。
cd C:\Program Files\caffe-windows\scripts\build\tools\Release
convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\train\ C:\Users\Administrator\Desktop\FaceDetection\train\train\train.txt C:\Users\Administrator\Desktop\FaceDetection\train_lmdb convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\val\ C:\Users\Administrator\Desktop\FaceDetection\train\val\val.txt C:\Users\Administrator\Desktop\FaceDetection\val_lmdb
3、训练ALEXNET网络:
3.1配置caffe文件:
1、train.prototxt
配置caffe格式的ALEXNET网络结构。
2、solver.prototxt
①net:指定网络配置文件路径:
②test_iter:设置一次测试需要测试的batch数。最好是test_iter * batch_size = 样本总个数。
③base_lr:基础学习率。最终总的学习率为base_lr * lr_mult(train.prototxt中每一层指定的)。学习率不能太大,太大会
注:windows版本中,配置文件的路径使用“/”,如:source: "C:/Users/Administrator/Desktop/FaceDetection/train_lmdb"
3.2编写脚本,训练模型,得到模型(_iter_36000.caffemodel):
cd C:\Program Files\caffe-windows\scripts\build\tools\Release
caffe.exe train --solver=C:\Users\Administrator\Desktop\FaceDetection\solver.prototxt
训练过程如下:
4、人脸识别检测算法框架:
4.1滑动窗口:
对输入的图片,画出不同的227*227的窗口(到目前为止,只支持固定大小的图片---卷积神经网络,最后的全连接层,参数固定。后面讲到全卷积网络,可以输入任意大小图片)。
为了检测不同尺寸图片中的人脸。需要进行多尺度scale变换。
FCN全卷积网络。得到heatmap,heatmap每一个点,代表了原图的每一个区域,其值为该区域是人脸的概率值。通过前向传播forward_all() ,得到heatmap。
设置阈值α,比如当α>0.9时,保存框。这样的结果可能得到多个框。可以使用NMS(非极大值抑制)得到最终的一个框。
4.2将训练时的全连接的Alexnet网络进行转换成全卷积网络fcn的模型:
可以根据caffe官网示例操作:https://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb
首先需要将原先全连接网络的deploy.prototxt文件中的全连接层(InnerProduct)转换层卷积网络层(Convolution),并计算设定kernel size。
然后使用以下代码转换成全卷积网络fcn的模型(full_conv.caffemodel)
net = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy.prototxt", r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel", caffe.TEST) params = ['fc6', 'fc7', 'fc8_flickr'] fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params} for fc in params: print("{} weights are {} dimensional and biases are {} dimensional".format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)) net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt", r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel", caffe.TEST) params_fully_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv'] conv_params = {pr:(net_fully_conv.params[pr][0].data, net_fully_conv.params[pr][1].data) for pr in params_fully_conv} for conv in params_fully_conv: print("{} weights are {} dimensional and biases are {} dimensional".format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)) for pr, pr_conv in zip(params, params_fully_conv): conv_params[pr_conv][0].flat = fc_params[pr][0].flat conv_params[pr_conv][1][...] = fc_params[pr][1] net_fully_conv.save(r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel")
4.3使用训练好的模型,编码实现人脸检测:
import os import sys import numpy as np import math import cv2 import random caffe_root = r"C:\Program Files\caffe-windows\python" sys.path.insert(0, caffe_root + 'python') os.environ['GLOG_minloglevel'] = '2' import caffe class Point(object): def __init__(self, x, y): self.x = x self.y = y class Rect(object): def __init__(self, p1, p2): """Store the top, bottom, left, right values for points p1, p2 are the left-top and right-bottom points of the rectangle""" self.left = min(p1.x, p2.x) self.right = max(p1.x, p2.x) self.bottom = min(p1.y, p2.y) self.top = max(p1.y, p2.y) def __str__(self): return "Rect[%d, %d, %d, %d]" %(self.left, self.top, self.right, self.bottom) def calcDistance(x1, y1, x2, y2): dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2) return dist def range_overlap(a_min, a_max, b_min, b_max): """Judge whether there is intersection on one dimension""" return (a_min <= b_max) and (a_max >= b_min) def rect_overlaps(r1, r2): """Judge whether the two rectangles have intersection""" return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top) def rect_merge(r1, r2, mergeThresh): """Calculate the merge area of two rectangles""" if rect_overlaps(r1, r2): SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(min(r1.top, r2.top) - max(r1.bottom, r2.bottom)) SA = abs(r1.right - r1.left) * abs(r1.top - r1.bottom) SB = abs(r2.right - r2.left) * abs(r2.top - r2.bottom) S = SA + SB - SI ratio = float(SI) / float(S) if ratio > mergeThresh: return 1 return 0 def generateBoundingBox(featureMap, scale): boundingBox = [] """We can calculate the stride from the architecture of the alexnet""" stride = 32 """We need to get the boundingbox whose size is 227 * 227. When we trained the alexnet, we also resize the size of the input image to 227 * 227 in caffe""" cellSize = 227 for (x, y), prob in np.ndenumerate(featureMap): if(prob >= 0.50): """Get the bounding box: we record the left-bottom and right-top coordinates""" boundingBox.append([float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale, float(stride * x + cellSize - 1) / scale, prob]) return boundingBox def nms_average(boxes, groupThresh = 2, overlapThresh=0.2): rects = [] for i in range(len(boxes)): if boxes[i][4] > 0.2: """The box in here, we record the left-bottom coordinates(y, x) and the height and width""" rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]]) rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh) rectangles = [] for i in range(len(rects)): testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3])) rectangles.append(testRect) clusters = [] for rect in rectangles: matched = 0 for cluster in clusters: if (rect_merge(rect, cluster, 0.2)): matched = 1 cluster.left = (cluster.left + rect.left) / 2 cluster.right = (cluster.right + rect.right) / 2 cluster.bottom = (cluster.bottom + rect.bottom) / 2 cluster.top = (cluster.top + rect.top) / 2 if (not matched): clusters.append(rect) result_boxes = [] for i in range(len(clusters)): result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1]) return result_boxes def face_detection(imgFlie): net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt", r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel", caffe.TEST) scales = [] factor = 0.793700526 img = cv2.imread(imgFlie) print(img.shape) largest = min(2, 4000 / max(img.shape[0:2])) scale = largest minD = largest * min(img.shape[0:2]) while minD >= 227: scales.append(scale) scale *= factor minD *= factor total_boxes = [] for scale in scales: scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale))) cv2.imwrite(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg", scale_img) im = caffe.io.load_image(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg") """Change the test input data size of the scaled image size """ net_fully_conv.blobs['data'].reshape(1, 3, scale_img.shape[1], scale_img.shape[0]) transformer = caffe.io.Transformer({'data': net_fully_conv.blobs['data'].data.shape}) transformer.set_transpose('data', (2, 0, 1)) transformer.set_channel_swap('data', (2, 1, 0)) transformer.set_raw_scale('data', 255.0) out = net_fully_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)])) print(out['prob'][0, 1].shape) boxes = generateBoundingBox(out['prob'][0, 1], scale) if (boxes): total_boxes.extend(boxes) print(total_boxes) boxes_nms = np.array(total_boxes) true_boxes = nms_average(boxes_nms, 1, 0.2) if (not true_boxes == []): (x1, y1, x2, y2) = true_boxes[0][:-1] cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0)) win = cv2.namedWindow('face detection', flags=0) cv2.imshow('face detection', img) cv2.waitKey(0) if __name__ == "__main__": img = r"C:\Users\Administrator\Desktop\FaceDetection\tmp9055.jpg" face_detection(img)
因为电脑配置实在是太低了,所以训练了好久,电脑开着跑了好几天,也没有训练很多次。所以模型训练的不是很好。本例中,经过调参,发现生成boundingbox时,prob设置为大于等于0.5得到的结果较好。结果如下:
另外,也是用过tensorflow写过训练代码,但是由于电脑太差,训练速度太慢、精度太差。待以后慢慢再进一步学习。
注:本人正在学习AI相关知识,本例只是通过视频学习加上自己动手操作实现人脸检测功能,仅供自己学习记录。