由於本人深度學習環境安裝在windows上,因此下面是在windows系統上實現的。僅供自己學習記錄。
使用caffe訓練模型,首先需要准備數據。
正樣本:對於人臉檢測項目,正樣本就是人臉的圖片。制作正樣本需要將人臉從圖片中裁剪出來(數據源已經標注出人臉在圖片中的坐標)。裁剪完成之后,需要check一下數據是否制作的沒問題。
負樣本:隨機進行裁剪,使用IOU確定是正樣本還是負樣本。比如:IOU<0.3為負樣本,最好是拿沒有人臉的圖片。
1、caffe數據源准備:
caffe支持LMDB數據,在訓練模型時首先需要將訓練集、驗證集轉換成LMDB數據。
首先需要准備兩個txt文件:train.txt和test.txt。格式如下:
/path/to/folder/image_x.jpg 0 (即圖片樣本所在的路徑和標簽。文本后面的標簽,對於二分類時,為0和1。本例中,0表示人臉數據,1表示非人臉數據。)
可以使用腳本來獲取txt文檔。簡單寫個腳本(獲取train.txt)如下:
(txt文檔中應該只需要相對路徑,如train.txt的格式如下:xxxx.jpg label ,下面的代碼有點問題)——2019年7月28日更新
import os full_train_path = r"C:\Users\Administrator\Desktop\FaceDetection\train.txt" full_val_path = r"C:\Users\Administrator\Desktop\FaceDetection\val.txt" train_txt = open(full_train_path, 'w') val_txt = open(full_val_path, 'w') # get train.txt for file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train"): for figure in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\train\\" + file): train_txt.writelines(file + r"/" + figure + " " + file + "\r\n") train_txt.close() #get val.txt for val_file in os.listdir(r"C:\Users\Administrator\Desktop\FaceDetection\train\val"): if val_file.find("faceimage") != -1: val_txt.writelines(val_file + " " + "0" + "\r\n") else: val_txt.writelines(val_file + " " + "1" + "\r\n") val_txt.close()
2、制作LMDB數據源:
分類問題使用LMDB數據,回歸問題使用HDF5數據。
使用caffe自帶的腳本文件,制作LMDB數據源。
convert_imageset使用格式:
convert_imageset --參數(如:resize、shuffle等) 數據源路徑 數據源的txt 需要輸出的lmdb路徑。
cd C:\Program Files\caffe-windows\scripts\build\tools\Release
convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\train\ C:\Users\Administrator\Desktop\FaceDetection\train\train\train.txt C:\Users\Administrator\Desktop\FaceDetection\train_lmdb convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:\Users\Administrator\Desktop\FaceDetection\train\val\ C:\Users\Administrator\Desktop\FaceDetection\train\val\val.txt C:\Users\Administrator\Desktop\FaceDetection\val_lmdb
3、訓練ALEXNET網絡:
3.1配置caffe文件:
1、train.prototxt
配置caffe格式的ALEXNET網絡結構。
2、solver.prototxt
①net:指定網絡配置文件路徑:
②test_iter:設置一次測試需要測試的batch數。最好是test_iter * batch_size = 樣本總個數。
③base_lr:基礎學習率。最終總的學習率為base_lr * lr_mult(train.prototxt中每一層指定的)。學習率不能太大,太大會
注:windows版本中,配置文件的路徑使用“/”,如:source: "C:/Users/Administrator/Desktop/FaceDetection/train_lmdb"
3.2編寫腳本,訓練模型,得到模型(_iter_36000.caffemodel):
cd C:\Program Files\caffe-windows\scripts\build\tools\Release
caffe.exe train --solver=C:\Users\Administrator\Desktop\FaceDetection\solver.prototxt
訓練過程如下:
4、人臉識別檢測算法框架:
4.1滑動窗口:
對輸入的圖片,畫出不同的227*227的窗口(到目前為止,只支持固定大小的圖片---卷積神經網絡,最后的全連接層,參數固定。后面講到全卷積網絡,可以輸入任意大小圖片)。
為了檢測不同尺寸圖片中的人臉。需要進行多尺度scale變換。
FCN全卷積網絡。得到heatmap,heatmap每一個點,代表了原圖的每一個區域,其值為該區域是人臉的概率值。通過前向傳播forward_all() ,得到heatmap。
設置閾值α,比如當α>0.9時,保存框。這樣的結果可能得到多個框。可以使用NMS(非極大值抑制)得到最終的一個框。
4.2將訓練時的全連接的Alexnet網絡進行轉換成全卷積網絡fcn的模型:
可以根據caffe官網示例操作:https://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb
首先需要將原先全連接網絡的deploy.prototxt文件中的全連接層(InnerProduct)轉換層卷積網絡層(Convolution),並計算設定kernel size。
然后使用以下代碼轉換成全卷積網絡fcn的模型(full_conv.caffemodel)
net = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy.prototxt", r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel", caffe.TEST) params = ['fc6', 'fc7', 'fc8_flickr'] fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params} for fc in params: print("{} weights are {} dimensional and biases are {} dimensional".format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)) net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt", r"C:\Users\Administrator\Desktop\FaceDetection\model2\_iter_36000.caffemodel", caffe.TEST) params_fully_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv'] conv_params = {pr:(net_fully_conv.params[pr][0].data, net_fully_conv.params[pr][1].data) for pr in params_fully_conv} for conv in params_fully_conv: print("{} weights are {} dimensional and biases are {} dimensional".format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)) for pr, pr_conv in zip(params, params_fully_conv): conv_params[pr_conv][0].flat = fc_params[pr][0].flat conv_params[pr_conv][1][...] = fc_params[pr][1] net_fully_conv.save(r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel")
4.3使用訓練好的模型,編碼實現人臉檢測:
import os import sys import numpy as np import math import cv2 import random caffe_root = r"C:\Program Files\caffe-windows\python" sys.path.insert(0, caffe_root + 'python') os.environ['GLOG_minloglevel'] = '2' import caffe class Point(object): def __init__(self, x, y): self.x = x self.y = y class Rect(object): def __init__(self, p1, p2): """Store the top, bottom, left, right values for points p1, p2 are the left-top and right-bottom points of the rectangle""" self.left = min(p1.x, p2.x) self.right = max(p1.x, p2.x) self.bottom = min(p1.y, p2.y) self.top = max(p1.y, p2.y) def __str__(self): return "Rect[%d, %d, %d, %d]" %(self.left, self.top, self.right, self.bottom) def calcDistance(x1, y1, x2, y2): dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2) return dist def range_overlap(a_min, a_max, b_min, b_max): """Judge whether there is intersection on one dimension""" return (a_min <= b_max) and (a_max >= b_min) def rect_overlaps(r1, r2): """Judge whether the two rectangles have intersection""" return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top) def rect_merge(r1, r2, mergeThresh): """Calculate the merge area of two rectangles""" if rect_overlaps(r1, r2): SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(min(r1.top, r2.top) - max(r1.bottom, r2.bottom)) SA = abs(r1.right - r1.left) * abs(r1.top - r1.bottom) SB = abs(r2.right - r2.left) * abs(r2.top - r2.bottom) S = SA + SB - SI ratio = float(SI) / float(S) if ratio > mergeThresh: return 1 return 0 def generateBoundingBox(featureMap, scale): boundingBox = [] """We can calculate the stride from the architecture of the alexnet""" stride = 32 """We need to get the boundingbox whose size is 227 * 227. When we trained the alexnet, we also resize the size of the input image to 227 * 227 in caffe""" cellSize = 227 for (x, y), prob in np.ndenumerate(featureMap): if(prob >= 0.50): """Get the bounding box: we record the left-bottom and right-top coordinates""" boundingBox.append([float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale, float(stride * x + cellSize - 1) / scale, prob]) return boundingBox def nms_average(boxes, groupThresh = 2, overlapThresh=0.2): rects = [] for i in range(len(boxes)): if boxes[i][4] > 0.2: """The box in here, we record the left-bottom coordinates(y, x) and the height and width""" rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]]) rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh) rectangles = [] for i in range(len(rects)): testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3])) rectangles.append(testRect) clusters = [] for rect in rectangles: matched = 0 for cluster in clusters: if (rect_merge(rect, cluster, 0.2)): matched = 1 cluster.left = (cluster.left + rect.left) / 2 cluster.right = (cluster.right + rect.right) / 2 cluster.bottom = (cluster.bottom + rect.bottom) / 2 cluster.top = (cluster.top + rect.top) / 2 if (not matched): clusters.append(rect) result_boxes = [] for i in range(len(clusters)): result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1]) return result_boxes def face_detection(imgFlie): net_fully_conv = caffe.Net(r"C:\Users\Administrator\Desktop\FaceDetection\deploy_full_conv.prototxt", r"C:\Users\Administrator\Desktop\FaceDetection\full_conv.caffemodel", caffe.TEST) scales = [] factor = 0.793700526 img = cv2.imread(imgFlie) print(img.shape) largest = min(2, 4000 / max(img.shape[0:2])) scale = largest minD = largest * min(img.shape[0:2]) while minD >= 227: scales.append(scale) scale *= factor minD *= factor total_boxes = [] for scale in scales: scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale))) cv2.imwrite(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg", scale_img) im = caffe.io.load_image(r"C:\Users\Administrator\Desktop\FaceDetection\scale_img.jpg") """Change the test input data size of the scaled image size """ net_fully_conv.blobs['data'].reshape(1, 3, scale_img.shape[1], scale_img.shape[0]) transformer = caffe.io.Transformer({'data': net_fully_conv.blobs['data'].data.shape}) transformer.set_transpose('data', (2, 0, 1)) transformer.set_channel_swap('data', (2, 1, 0)) transformer.set_raw_scale('data', 255.0) out = net_fully_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)])) print(out['prob'][0, 1].shape) boxes = generateBoundingBox(out['prob'][0, 1], scale) if (boxes): total_boxes.extend(boxes) print(total_boxes) boxes_nms = np.array(total_boxes) true_boxes = nms_average(boxes_nms, 1, 0.2) if (not true_boxes == []): (x1, y1, x2, y2) = true_boxes[0][:-1] cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0)) win = cv2.namedWindow('face detection', flags=0) cv2.imshow('face detection', img) cv2.waitKey(0) if __name__ == "__main__": img = r"C:\Users\Administrator\Desktop\FaceDetection\tmp9055.jpg" face_detection(img)
因為電腦配置實在是太低了,所以訓練了好久,電腦開着跑了好幾天,也沒有訓練很多次。所以模型訓練的不是很好。本例中,經過調參,發現生成boundingbox時,prob設置為大於等於0.5得到的結果較好。結果如下:
另外,也是用過tensorflow寫過訓練代碼,但是由於電腦太差,訓練速度太慢、精度太差。待以后慢慢再進一步學習。
注:本人正在學習AI相關知識,本例只是通過視頻學習加上自己動手操作實現人臉檢測功能,僅供自己學習記錄。