以下學習均由此:https://github.com/AITTSMD/MTCNN-Tensorflow
數據集
WIDER Face for face detection and Celeba for landmark detection
WIDER Face


總共62個場景的文件夾,每個文件夾中多張圖片

文件中保存的是每個圖片中所有人臉框的位置,表示意義如下:

Celeba


兩個文件夾分別表示來源不同的圖片。It contains 5,590 LFW images and 7,876 other images downloaded from the web. The training set and validation set are defined in trainImageList.txt and testImageList.txt

每張圖片有對應的人臉框和5個關鍵點坐標
基礎問題
a.樣本問題,mtcnn訓練時,會把訓練的原圖樣本,通過目標所在區域進行裁剪,得到三類訓練樣本,即:正樣本、負樣本、部分(part)樣本
裁剪方式——對目標區域,做平移、縮放等變換得到裁剪區域(Since the training data for landmark is less.I use transform,random rotate and random flip to conduct data augment)
IoU——目標區域和裁剪區域的重合度
此時三類樣本如下定義:
正樣本:IoU >= 0.65,標簽為1
負樣本:IoU < 0.3,標簽為0
部分(part)樣本:0.65 > IoU >= 0.4,標簽為-1
Since MTCNN is a Multi-task Network,we should pay attention to the format of training data.The format is:
[path to image][cls_label][bbox_label][landmark_label]
For pos sample,cls_label=1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].
For part sample,cls_label=-1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].
For landmark sample,cls_label=-2,bbox_label=[0,0,0,0],landmark_label(calculate).
For neg sample,cls_label=0,bbox_label=[0,0,0,0],landmark_label=[0,0,0,0,0,0,0,0,0,0].
b.網絡問題,mtcnn分為三個小網絡,分別是PNet、RNet、ONet,新版多了一個關鍵點回歸的Net(這個不談)。
PNet:12 x 12,負責粗選得到候選框,功能有:分類、回歸
RNet:24 x 24,負責篩選PNet的粗篩結果,並微調box使得更加准確,功能有:分類、回歸
ONet:48 x 48,負責最后的篩選判定,並微調box,回歸得到keypoint的位置,功能有:分類、回歸、關鍵點
c.網絡大小的問題,訓練時輸入圖像大小為網絡指定的大小,例如12 x 12,而因為PNet沒有全連接層,是全卷積的網絡,所以預測識別的時候是沒有尺寸要求的,那么PNet可以對任意輸入尺寸進行預測得到k個boundingbox和置信度,通過閾值過濾即可完成候選框提取過程,而該網絡因為結構小,所以效率非常高。
PNet
- Run
prepare_data/gen_12net_data.pyto generate training data(Face Detection Part) for PNet. - Run
gen_landmark_aug_12.pyto generate training data(Face Landmark Detection Part) for PNet. - Run
gen_imglist_pnet.pyto merge two parts of training data. - Run
gen_PNet_tfrecords.pyto generate tfrecord for PNet.
生成數據(for Face Detection)
運行結果:
12880 pics in total ... 12800 images done, pos: 458655 part: 1125289 neg: 995342

以一張圖片為例,講解三類樣本的生成過程:
1.在原圖基礎上,隨機取50個樣本,保留IoU<0.3的剪裁圖作為負樣本
2.針對圖片中的每個人臉框的位置:
a.循環5次,取人臉框附近的IoU < 0.3的剪裁圖像作為負樣本,若剪裁圖中的坐標超過原圖大小,則拋棄
b.循環20次,取人臉框附近的剪裁圖,IoU >= 0.65作為正樣本,0.65 > IoU >= 0.4作為部分樣本
上述所有樣本都要以(12,12)的大小保存
txt中部分內容:

prepare_data/gen_12net_data.py部分代碼:
1.生成50個負樣本
import numpy.random as npr neg_num = 0 #1---->50 # keep crop random parts, until have 50 negative examples # get 50 negative sample from every image while neg_num < 50: #neg_num's size [40,min(width, height) / 2],min_size:40 # size is a random number between 12 and min(width,height) size = npr.randint(12, min(width, height) / 2) #top_left coordinate nx = npr.randint(0, width - size) ny = npr.randint(0, height - size) #random crop crop_box = np.array([nx, ny, nx + size, ny + size]) #calculate iou Iou = IoU(crop_box, boxes) #crop a part from inital image cropped_im = img[ny : ny + size, nx : nx + size, :] #resize the cropped image to size 12*12 resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) if np.max(Iou) < 0.3: # Iou with all gts must below 0.3 save_file = os.path.join(neg_save_dir, "%s.jpg"%n_idx) f2.write("DATA/12/negative/%s.jpg"%n_idx + ' 0\n') cv2.imwrite(save_file, resized_im) n_idx += 1 neg_num += 1
2.對於每個box生成三類樣本
#for every bounding boxes for box in boxes: # box (x_left, y_top, x_right, y_bottom) x1, y1, x2, y2 = box #gt's width w = x2 - x1 + 1 #gt's height h = y2 - y1 + 1 # ignore small faces and those faces has left-top corner out of the image # in case the ground truth boxes of small faces are not accurate if max(w, h) < 20 or x1 < 0 or y1 < 0: continue # crop another 5 images near the bounding box if IoU less than 0.5, save as negative samples for i in range(5): #size of the image to be cropped size = npr.randint(12, min(width, height) / 2) # delta_x and delta_y are offsets of (x1, y1) # max can make sure if the delta is a negative number , x1+delta_x >0 # parameter high of randint make sure there will be intersection between bbox and cropped_box delta_x = npr.randint(max(-size, -x1), w) delta_y = npr.randint(max(-size, -y1), h) # max here not really necessary nx1 = int(max(0, x1 + delta_x)) ny1 = int(max(0, y1 + delta_y)) # if the right bottom point is out of image then skip if nx1 + size > width or ny1 + size > height: continue crop_box = np.array([nx1, ny1, nx1 + size, ny1 + size]) Iou = IoU(crop_box, boxes) cropped_im = img[ny1: ny1 + size, nx1: nx1 + size, :] #rexize cropped image to be 12 * 12 resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) if np.max(Iou) < 0.3: # Iou with all gts must below 0.3 save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx) f2.write("DATA/12/negative/%s.jpg" % n_idx + ' 0\n') cv2.imwrite(save_file, resized_im) n_idx += 1 #generate positive examples and part faces for i in range(20): # pos and part face size [minsize*0.8,maxsize*1.25] size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h))) # delta here is the offset of box center if w<5: print (w) continue delta_x = npr.randint(-w * 0.2, w * 0.2) delta_y = npr.randint(-h * 0.2, h * 0.2) #show this way: nx1 = max(x1+w/2-size/2+delta_x) # x1+ w/2 is the central point, then add offset , then deduct size/2 # deduct size/2 to make sure that the right bottom corner will be out of nx1 = int(max(x1 + w / 2 + delta_x - size / 2, 0)) #show this way: ny1 = max(y1+h/2-size/2+delta_y) ny1 = int(max(y1 + h / 2 + delta_y - size / 2, 0)) nx2 = nx1 + size ny2 = ny1 + size if nx2 > width or ny2 > height: continue crop_box = np.array([nx1, ny1, nx2, ny2]) #yu gt de offset offset_x1 = (x1 - nx1) / float(size) offset_y1 = (y1 - ny1) / float(size) offset_x2 = (x2 - nx2) / float(size) offset_y2 = (y2 - ny2) / float(size) #crop cropped_im = img[ny1 : ny2, nx1 : nx2, :] #resize resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) box_ = box.reshape(1, -1) iou = IoU(crop_box, box_) if iou >= 0.65: save_file = os.path.join(pos_save_dir, "%s.jpg"%p_idx) f1.write("DATA/12/positive/%s.jpg"%p_idx + ' 1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2)) cv2.imwrite(save_file, resized_im) p_idx += 1 elif iou >= 0.4: save_file = os.path.join(part_save_dir, "%s.jpg"%d_idx) f3.write("DATA/12/part/%s.jpg"%d_idx + ' -1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2)) cv2.imwrite(save_file, resized_im) d_idx += 1
生成數據(for Landmark)
針對Celeba提供的數據生成訓練數據(生成結果共1w條)
1.調整特征點的坐標
調整方式代碼:
#gt_box為bounding_box的坐標點 gt_box = np.array([bbox.left,bbox.top,bbox.right,bbox.bottom]) #initialize the landmark landmark = np.zeros((5, 2)) for index, one in enumerate(landmarkGt): # (( x - bbox.left)/ width of bounding box, (y - bbox.top)/ height of bounding box特征點調整 rv = ((one[0]-gt_box[0])/(gt_box[2]-gt_box[0]), (one[1]-gt_box[1])/(gt_box[3]-gt_box[1])) # put the normalized value into the new list landmark landmark[index] = rv
2.對數據進行拓展(旋轉,翻轉等,具體內容參考 prepare_data/gen_landmark_aug_12.py )
運行結果

變為

合並數據
運行結果:


When training PNet,I merge four parts of data(pos,part,landmark,neg) into one tfrecord,since their total number radio is almost 1:1:1:3
轉換數據成tfrecord
運行結果:

生成文件:

prepare_data目錄下read_tfrecord_v2.py/tfrecord_utils.py用於讀取tfrecord數據,並對其解析
可以自行關心下怎么寫成tfrecord文件的
''' dataset是個數組類型,讀取的是合並之后的文件,把文件中的每行信息解析成字典形式 tf_filename是要寫入的tfrecord文件 ''' with tf.python_io.TFRecordWriter(tf_filename) as tfrecord_writer: for i, image_example in enumerate(dataset): if (i+1) % 100 == 0: sys.stdout.write('\r>> %d/%d images has been converted' % (i+1, len(dataset))) #sys.stdout.write('\r>> Converting image %d/%d' % (i + 1, len(dataset))) sys.stdout.flush() filename = image_example['filename'] _add_to_tfrecord(filename, image_example, tfrecord_writer) def _add_to_tfrecord(filename, image_example, tfrecord_writer): """Loads data from image and annotations files and add them to a TFRecord. Args: filename: Dataset directory; name: Image name to add to the TFRecord; tfrecord_writer: The TFRecord writer to use for writing. """ # 其中的_process_image_withoutcoder,_convert_to_example_simple兩個函數在tfrecord_utils.py文件中 image_data, height, width = _process_image_withoutcoder(filename) example = _convert_to_example_simple(image_example, image_data) tfrecord_writer.write(example.SerializeToString())
prepare_data/tfrecord_utils.py
def _process_image_withoutcoder(filename): #print(filename) image = cv2.imread(filename) #print(type(image)) # transform data into string format image_data = image.tostring() assert len(image.shape) == 3 height = image.shape[0] width = image.shape[1] assert image.shape[2] == 3 # return string data and initial height and width of the image return image_data, height, width def _convert_to_example_simple(image_example, image_buffer): """ covert to tfrecord file :param image_example: dict, an image example :param image_buffer: string, JPEG encoding of RGB image :param colorspace: :param channels: :param image_format: :return: Example proto """ # filename = str(image_example['filename']) # class label for the whole image class_label = image_example['label'] bbox = image_example['bbox'] roi = [bbox['xmin'],bbox['ymin'],bbox['xmax'],bbox['ymax']] landmark = [bbox['xlefteye'],bbox['ylefteye'],bbox['xrighteye'],bbox['yrighteye'],bbox['xnose'],bbox['ynose'], bbox['xleftmouth'],bbox['yleftmouth'],bbox['xrightmouth'],bbox['yrightmouth']] example = tf.train.Example(features=tf.train.Features(feature={ 'image/encoded': _bytes_feature(image_buffer), 'image/label': _int64_feature(class_label), 'image/roi': _float_feature(roi), 'image/landmark': _float_feature(landmark) })) return example def _int64_feature(value): """Wrapper for insert int64 feature into Example proto.""" if not isinstance(value, list): value = [value] return tf.train.Feature(int64_list=tf.train.Int64List(value=value)) def _float_feature(value): """Wrapper for insert float features into Example proto.""" if not isinstance(value, list): value = [value] return tf.train.Feature(float_list=tf.train.FloatList(value=value)) def _bytes_feature(value): """Wrapper for insert bytes features into Example proto.""" if not isinstance(value, list): value = [value] return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
prepare_data/read_tfrecord_v2.py 在訓練的時候需要解析tfrecord文件
def read_single_tfrecord(tfrecord_file, batch_size, net): # generate a input queue # each epoch shuffle filename_queue = tf.train.string_input_producer([tfrecord_file],shuffle=True) # read tfrecord reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_queue) image_features = tf.parse_single_example( serialized_example, features={ 'image/encoded': tf.FixedLenFeature([], tf.string),#one image one record 'image/label': tf.FixedLenFeature([], tf.int64), 'image/roi': tf.FixedLenFeature([4], tf.float32), 'image/landmark': tf.FixedLenFeature([10],tf.float32) } ) if net == 'PNet': image_size = 12 elif net == 'RNet': image_size = 24 else: image_size = 48 image = tf.decode_raw(image_features['image/encoded'], tf.uint8) image = tf.reshape(image, [image_size, image_size, 3]) image = (tf.cast(image, tf.float32)-127.5) / 128 # image = tf.image.per_image_standardization(image) label = tf.cast(image_features['image/label'], tf.float32) roi = tf.cast(image_features['image/roi'],tf.float32) landmark = tf.cast(image_features['image/landmark'],tf.float32) image, label,roi,landmark = tf.train.batch( [image, label,roi,landmark], batch_size=batch_size, num_threads=2, capacity=1 * batch_size ) label = tf.reshape(label, [batch_size]) roi = tf.reshape(roi,[batch_size,4]) landmark = tf.reshape(landmark,[batch_size,10]) return image, label, roi,landmark
訓練
三個網絡的訓練代碼在train_models文件夾下:
MTCNN_config.py——參數的配置
mtcnn_model.py——模型的定義,包含Pnet,Rnet,Onet的網絡結構
train.py——訓練模型,mtcnn_model.py包含的是網絡結構和損失函數的計算,本文件中加入優化器,和對應的訓練代碼,並將結果保存到tensorboard中
train_?net.py——真正需要被執行的文件,訓練各個網絡
運行結果如下
[root@node5 MTCNN-Tensorflow]# python train_models/train_PNet.py ['/ssd/yuansaijie/MTCNN-Tensorflow/train_models', '/ssd/yuansaijie/MTCNN-Tensorflow', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages/pika-0.9.14-py2.7.egg', '/usr/lib/python2.7/site-packages/elasticsearch-1.4.0-py2.7.egg', '../prepare_data'] DATA/imglists/PNet/train_PNet_landmark.txt ('Total size of the dataset is: ', 1260000) mymodel/MTCNN_model/PNet_landmark/PNet ('dataset dir is:', 'DATA/imglists/PNet/train_PNet_landmark.tfrecord_shuffle') (384, 12, 12, 3) ('load summary for : ', u'conv1/add') (384, 10, 10, 10) ('load summary for : ', u'pool1/MaxPool') (384, 5, 5, 10) ('load summary for : ', u'conv2/add') (384, 3, 3, 16) ('load summary for : ', u'conv3/add') (384, 1, 1, 32) ('load summary for : ', u'conv4_1/Reshape_1') (384, 1, 1, 2) ('load summary for : ', u'conv4_2/BiasAdd') (384, 1, 1, 4) ('load summary for : ', u'conv4_3/BiasAdd') (384, 1, 1, 10) WARNING:tensorflow:From /ssd/yuansaijie/MTCNN-Tensorflow/train_models/mtcnn_model.py:235: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_regularization_losses instead. 2018-10-19 11:44:15.160774: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled t....................................... 。。。。。。 2018-10-19 10:23:49.778847 : Step: 97900/98460, accuracy: 0.934169, cls loss: 0.223913, bbox loss: 0.065459,Landmark loss :0.018630,L2 loss: 0.016533, Total Loss: 0.282490 ,lr:0.000001 2018-10-19 10:23:52.010314 : Step: 98000/98460, accuracy: 0.916667, cls loss: 0.278652, bbox loss: 0.075655,Landmark loss :0.016387,L2 loss: 0.016533, Total Loss: 0.341207 ,lr:0.000001 2018-10-19 10:23:54.169109 : Step: 98100/98460, accuracy: 0.961039, cls loss: 0.175593, bbox loss: 0.071169,Landmark loss :0.032753,L2 loss: 0.016533, Total Loss: 0.244087 ,lr:0.000001 2018-10-19 10:23:56.376758 : Step: 98200/98460, accuracy: 0.890365, cls loss: 0.327316, bbox loss: 0.073061,Landmark loss :0.018354,L2 loss: 0.016533, Total Loss: 0.389556 ,lr:0.000001 2018-10-19 10:23:58.548301 : Step: 98300/98460, accuracy: 0.918919, cls loss: 0.286136, bbox loss: 0.072269,Landmark loss :0.030357,L2 loss: 0.016533, Total Loss: 0.353982 ,lr:0.000001 2018-10-19 10:24:00.754086 : Step: 98400/98460, accuracy: 0.920000, cls loss: 0.247473, bbox loss: 0.062291,Landmark loss :0.030228,L2 loss: 0.016533, Total Loss: 0.310266 ,lr:0.000001 ('path prefix is :', 'mymodel/MTCNN_model/PNet_landmark/PNet-30') #用tensorboard查看,具體使用方法可結合代碼和手冊 #https://www.tensorflow.org/guide/summaries_and_tensorboard [root@node5 MTCNN-Tensorflow]# tensorboard --logdir=logs/ TensorBoard 0.4.0rc3 at http://node5:6006 (Press CTRL+C to quit)
重點代碼理解
def train(net_factory, prefix, end_epoch, base_dir, display=200, base_lr=0.01): """ train PNet/RNet/ONet :param net_factory: 即mtcnn_model.py中定義的三個網絡結構 :param prefix: model path 模型保存路徑 :param end_epoch: :param dataset: base_dir表示訓練數據所在的位置 :param display: :param base_lr: :return: """ net = prefix.split('/')[-1] #label file label_file = os.path.join(base_dir,'train_%s_landmark.txt' % net) #label_file = os.path.join(base_dir,'landmark_12_few.txt') print(label_file) f = open(label_file, 'r') # get number of training examples num = len(f.readlines()) print("Total size of the dataset is: ", num) print(prefix) #PNet use this method to get data讀取訓練數據 if net == 'PNet': #dataset_dir = os.path.join(base_dir,'train_%s_ALL.tfrecord_shuffle' % net) dataset_dir = os.path.join(base_dir,'train_%s_landmark.tfrecord_shuffle' % net) print('dataset dir is:',dataset_dir) image_batch, label_batch, bbox_batch,landmark_batch = read_single_tfrecord(dataset_dir, config.BATCH_SIZE, net) #RNet use 3 tfrecords to get data else: pos_dir = os.path.join(base_dir,'pos_landmark.tfrecord_shuffle') part_dir = os.path.join(base_dir,'part_landmark.tfrecord_shuffle') neg_dir = os.path.join(base_dir,'neg_landmark.tfrecord_shuffle') #landmark_dir = os.path.join(base_dir,'landmark_landmark.tfrecord_shuffle') landmark_dir = os.path.join('DATA/imglists/RNet','landmark_landmark.tfrecord_shuffle') dataset_dirs = [pos_dir,part_dir,neg_dir,landmark_dir] pos_radio = 1.0/6;part_radio = 1.0/6;landmark_radio=1.0/6;neg_radio=3.0/6 pos_batch_size = int(np.ceil(config.BATCH_SIZE*pos_radio)) assert pos_batch_size != 0,"Batch Size Error " part_batch_size = int(np.ceil(config.BATCH_SIZE*part_radio)) assert part_batch_size != 0,"Batch Size Error " neg_batch_size = int(np.ceil(config.BATCH_SIZE*neg_radio)) assert neg_batch_size != 0,"Batch Size Error " landmark_batch_size = int(np.ceil(config.BATCH_SIZE*landmark_radio)) assert landmark_batch_size != 0,"Batch Size Error " batch_sizes = [pos_batch_size,part_batch_size,neg_batch_size,landmark_batch_size] #print('batch_size is:', batch_sizes) image_batch, label_batch, bbox_batch,landmark_batch = read_multi_tfrecords(dataset_dirs,batch_sizes, net) #landmark_dir 定義損失函數比重,畢竟是三個任務損失的結合 if net == 'PNet': image_size = 12 radio_cls_loss = 1.0;radio_bbox_loss = 0.5;radio_landmark_loss = 0.5; elif net == 'RNet': image_size = 24 radio_cls_loss = 1.0;radio_bbox_loss = 0.5;radio_landmark_loss = 0.5; else: radio_cls_loss = 1.0;radio_bbox_loss = 0.5;radio_landmark_loss = 1; image_size = 48 #define placeholder為數據輸入和label定義占位符 input_image = tf.placeholder(tf.float32, shape=[config.BATCH_SIZE, image_size, image_size, 3], name='input_image') label = tf.placeholder(tf.float32, shape=[config.BATCH_SIZE], name='label') bbox_target = tf.placeholder(tf.float32, shape=[config.BATCH_SIZE, 4], name='bbox_target') landmark_target = tf.placeholder(tf.float32,shape=[config.BATCH_SIZE,10],name='landmark_target') #get loss and accuracy input_image = image_color_distort(input_image) cls_loss_op,bbox_loss_op,landmark_loss_op,L2_loss_op,accuracy_op = net_factory(input_image, label, bbox_target,landmark_target,training=True) #此處net_factory為Pnet,得到各個部分的損失值 #train,update learning rate(3 loss) total_loss_op = radio_cls_loss*cls_loss_op + radio_bbox_loss*bbox_loss_op + radio_landmark_loss*landmark_loss_op + L2_loss_op #訓練模型,train_model函數中定義了優化器tf.train.MomentumOptimizer train_op, lr_op = train_model(base_lr, total_loss_op, num) # init init = tf.global_variables_initializer() sess = tf.Session() #save model saver = tf.train.Saver(max_to_keep=0) sess.run(init) #visualize some variables tf.summary.scalar("cls_loss",cls_loss_op)#cls_loss tf.summary.scalar("bbox_loss",bbox_loss_op)#bbox_loss tf.summary.scalar("landmark_loss",landmark_loss_op)#landmark_loss tf.summary.scalar("cls_accuracy",accuracy_op)#cls_acc tf.summary.scalar("total_loss",total_loss_op)#cls_loss, bbox loss, landmark loss and L2 loss add together summary_op = tf.summary.merge_all() logs_dir = "logs/%s" %(net) if os.path.exists(logs_dir) == False: os.mkdir(logs_dir) writer = tf.summary.FileWriter(logs_dir,sess.graph) projector_config = projector.ProjectorConfig() projector.visualize_embeddings(writer,projector_config) #begin coord = tf.train.Coordinator() #begin enqueue thread threads = tf.train.start_queue_runners(sess=sess, coord=coord) i = 0 #total steps MAX_STEP = int(num / config.BATCH_SIZE + 1) * end_epoch epoch = 0 sess.graph.finalize() #正式開始訓練 try: for step in range(MAX_STEP): i = i + 1 if coord.should_stop(): break image_batch_array, label_batch_array, bbox_batch_array,landmark_batch_array = sess.run([image_batch, label_batch, bbox_batch,landmark_batch]) #random flip image_batch_array,landmark_batch_array = random_flip_images(image_batch_array,label_batch_array,landmark_batch_array) ''' print(image_batch_array.shape) print(label_batch_array.shape) print(bbox_batch_array.shape) print(landmark_batch_array.shape) print(label_batch_array[0]) print(bbox_batch_array[0]) print(landmark_batch_array[0]) ''' _,_,summary = sess.run([train_op, lr_op ,summary_op], feed_dict={input_image: image_batch_array, label: label_batch_array, bbox_target: bbox_batch_array,landmark_target:landmark_batch_array}) if (step+1) % display == 0: #acc = accuracy(cls_pred, labels_batch) cls_loss, bbox_loss,landmark_loss,L2_loss,lr,acc = sess.run([cls_loss_op, bbox_loss_op,landmark_loss_op,L2_loss_op,lr_op,accuracy_op], feed_dict={input_image: image_batch_array, label: label_batch_array, bbox_target: bbox_batch_array, landmark_target: landmark_batch_array}) total_loss = radio_cls_loss*cls_loss + radio_bbox_loss*bbox_loss + radio_landmark_loss*landmark_loss + L2_loss # landmark loss: %4f, print("%s : Step: %d/%d, accuracy: %3f, cls loss: %4f, bbox loss: %4f,Landmark loss :%4f,L2 loss: %4f, Total Loss: %4f ,lr:%f " % ( datetime.now(), step+1,MAX_STEP, acc, cls_loss, bbox_loss,landmark_loss, L2_loss,total_loss, lr)) #save every two epochs if i * config.BATCH_SIZE > num*2: epoch = epoch + 1 i = 0 path_prefix = saver.save(sess, prefix, global_step=epoch*2) print('path prefix is :', path_prefix) writer.add_summary(summary,global_step=step) except tf.errors.OutOfRangeError: print("完成!!!") finally: coord.request_stop() writer.close() coord.join(threads) sess.close()
RNet
- After training PNet, run
gen_hard_exampleto generate training data(Face Detection Part) for RNet. - Run
gen_landmark_aug_24.pyto generate training data(Face Landmark Detection Part) for RNet. - Run
gen_imglist_rnet.pyto merge two parts of training data. - Run
gen_RNet_tfrecords.pyto generate tfrecords for RNet.(you should run this script four times to generate tfrecords of neg,pos,part and landmark respectively)
生成數據(for Face Detection)
運行結果如下
[root@node5 MTCNN-Tensorflow]# python prepare_data/gen_hard_example.py Called with argument: Namespace(batch_size=[2048, 256, 16], epoch=[18, 14, 16], min_face=20, prefix=['data/MTCNN_model/PNet_landmark/PNet', 'data/MTCNN_model/RNet_No_Landmark/RNet', 'data/MTCNN_model/ONet_No_Landmark/ONet'], shuffle=False, slide_window=False, stride=2, test_mode='PNet', thresh=[0.3, 0.1, 0.7], vis=False) ('Test model: ', 'PNet') data/MTCNN_model/PNet_landmark/PNet-18 (1, ?, ?, 3) ('load summary for : ', u'conv1/add') (1, ?, ?, 10) ('load summary for : ', u'pool1/MaxPool') (1, ?, ?, 10) ('load summary for : ', u'conv2/add') (1, ?, ?, 16) ('load summary for : ', u'conv3/add') (1, ?, ?, 32) ('load summary for : ', u'conv4_1/Reshape_1') (1, ?, ?, 2) ('load summary for : ', u'conv4_2/BiasAdd') (1, ?, ?, 4) ('load summary for : ', u'conv4_3/BiasAdd') (1, ?, ?, 10) 2018-10-19 14:55:32.129731: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA data/MTCNN_model/PNet_landmark/PNet-18 restore models' param ================================== load test data finish loading start detecting.... 100 out of 12880 images done 0.735359 seconds for each image 200 out of 12880 images done 0.703251 seconds for each image 300 out of 12880 images done ........ 12700 out of 12880 images done 0.733344 seconds for each image 12800 out of 12880 images done 0.669486 seconds for each image ('num of images', 12880) time cost in average0.637 pnet 0.637 rnet 0.000 onet 0.000 ('boxes length:', 12880) finish detecting ----------------------------------------以上都是在完成Pnet的預測,預測結果保存為detections.pkl save_path is : DATA/no_LM24/RNet 24測試完成開始OHEM processing 12880 images in total -----------------------對比預測和真實結果,生成Rnet的三類訓練樣本 12880 12880 0 images done 100 images done 200 images done ......
Detection文件夾下是測試過程的代碼(此處不講解,后續用facenet中的代碼學習如何預測),RNet訓練數據的生成需要利用上一步中PNet模型進行預測,根據模型的預測結果與真實結果比較,生成對應的三類樣本,此處生成圖片無隨機因素,完全是由上一個網絡(PNet)的預測結果與真實結果對比整理得到。
核心代碼
1 # im_idx_list,gt_boxes_list是原訓練集的圖片和bounding_box數據,det_boxes是上一個網絡的測試結果 2 for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list): 3 gts = np.array(gts, dtype=np.float32).reshape(-1, 4) 4 5 if dets.shape[0] == 0: 6 continue 7 img = cv2.imread(im_idx) 8 #change to square 9 dets = convert_to_square(dets) 10 dets[:, 0:4] = np.round(dets[:, 0:4]) 11 neg_num = 0 12 for box in dets: 13 x_left, y_top, x_right, y_bottom, _ = box.astype(int) 14 width = x_right - x_left + 1 15 height = y_bottom - y_top + 1 16 17 # ignore box that is too small or beyond image border 18 if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1: 19 continue 20 21 # compute intersection over union(IoU) between current box and all gt boxes 22 Iou = IoU(box, gts) 23 cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :] 24 resized_im = cv2.resize(cropped_im, (image_size, image_size), interpolation=cv2.INTER_LINEAR) 25 26 # save negative images and write label 27 # Iou with all gts must below 0.3 28 if np.max(Iou) < 0.3 and neg_num < 60: 29 #save the examples 30 save_file = get_path(neg_dir, "%s.jpg" % n_idx) 31 # print(save_file) 32 neg_file.write(save_file + ' 0\n') 33 cv2.imwrite(save_file, resized_im) 34 n_idx += 1 35 neg_num += 1 36 else: 37 # find gt_box with the highest iou 38 idx = np.argmax(Iou) 39 assigned_gt = gts[idx] 40 x1, y1, x2, y2 = assigned_gt 41 42 # compute bbox reg label 43 offset_x1 = (x1 - x_left) / float(width) 44 offset_y1 = (y1 - y_top) / float(height) 45 offset_x2 = (x2 - x_right) / float(width) 46 offset_y2 = (y2 - y_bottom) / float(height) 47 48 # save positive and part-face images and write labels 49 if np.max(Iou) >= 0.65: 50 save_file = get_path(pos_dir, "%s.jpg" % p_idx) 51 pos_file.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2)) 52 cv2.imwrite(save_file, resized_im) 53 p_idx += 1 54 55 elif np.max(Iou) >= 0.4: 56 save_file = os.path.join(part_dir, "%s.jpg" % d_idx) 57 part_file.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2)) 58 cv2.imwrite(save_file, resized_im) 59 d_idx += 1
生成數據(for Landmark)
與PNet類似,只是轉換的size變成24,運行結果如下:

調整結果未變,resize大小變為24.

合並數據
與PNet類似,運行結果如下:

分別是neg,pos,part,landmark的樣本數
轉換數據成tfrecord
需要運行四次:將main函數中的name分別修改成pos,neg,part,landmark


訓練
運行結果如下
[root@node5 MTCNN-Tensorflow]# python train_models/train_RNet.py ['/ssd/yuansaijie/MTCNN-Tensorflow/train_models', '/ssd/yuansaijie/MTCNN-Tensorflow', '/usr/lib64/pyth on27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-package s', '/usr/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages/pika-0.9.14-py2.7.egg', '/us r/lib/python2.7/site-packages/elasticsearch-1.4.0-py2.7.egg', '../prepare_data'] DATA/imglists_noLM/RNet/train_RNet_landmark.txt ('Total size of the dataset is: ', 1895256) mymodel/MTCNN_model/RNet_landmark/RNet (64, 24, 24, 3) (64, 24, 24, 3) (192, 24, 24, 3) (64, 24, 24, 3) (384, 24, 24, 3) (384, 4) (384, 24, 24, 3) (384, 22, 22, 28) (384, 11, 11, 28) (384, 9, 9, 48) (384, 4, 4, 48) (384, 3, 3, 64) (384, 576) (384, 128) (384, 2) (384, 4) (384, 10) WARNING:tensorflow:From /ssd/yuansaijie/MTCNN-Tensorflow/train_models/mtcnn_model.py:282: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_regularization_losses instead. 2018-10-22 11:00:52.810807: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-10-22 11:01:05.694332 : Step: 100/108592, accuracy: 0.750000, cls loss: 0.657524, bbox loss: 0.112904,Landmark loss :0.150184,L2 loss: 0.023872, Total Loss: 0.812940 ,lr:0.001000 2018-10-22 11:01:17.431871 : Step: 200/108592, accuracy: 0.750000, cls loss: 0.648712, bbox loss: 0.093683,Landmark loss :0.141217,L2 loss: 0.023827, Total Loss: 0.789989 ,lr:0.001000 。。。 。。。 2018-10-22 14:33:03.275786 : Step: 108500/108592, accuracy: 0.976562, cls loss: 0.130488, bbox loss: 0.086588,Landmark loss :0.023444,L2 loss: 0.024208, Total Loss: 0.209711 ,lr:0.000001 ('path prefix is :', 'mymodel/MTCNN_model/RNet_landmark/RNet-22')
ONet
- After training RNet, run
gen_hard_exampleto generate training data(Face Detection Part) for ONet. - Run
gen_landmark_aug_48.pyto generate training data(Face Landmark Detection Part) for ONet. - Run
gen_imglist_onet.pyto merge two parts of training data. - Run
gen_ONet_tfrecords.pyto generate tfrecords for ONet.(you should run this script four times to generate tfrecords of neg,pos,part and landmark respectively)
生成數據(for Face Detection)
根據前兩步訓練的模型做預測,對比真實數據集得到Onet的訓練數據,運行結果如下:
[root@node5 MTCNN-Tensorflow]# python prepare_data/gen_hard_example.py Called with argument: Namespace(batch_size=[2048, 256, 16], epoch=[18, 14, 16], min_face=20, prefix=['data/MTCNN_model/PNet_landmark/PNet', 'data/MTCNN_model/RNet_landmark/RNet', 'data/MTCNN_model/ONet_No_Landmark/ONet'], shuf fle=False, slide_window=False, stride=2, test_mode='RNet', thresh=[0.3, 0.1, 0.7], vis=False) ('Test model: ', 'RNet') data/MTCNN_model/PNet_landmark/PNet-18 (1, ?, ?, 3) ('load summary for : ', u'conv1/add') (1, ?, ?, 10) ('load summary for : ', u'pool1/MaxPool') (1, ?, ?, 10) ('load summary for : ', u'conv2/add') (1, ?, ?, 16) ('load summary for : ', u'conv3/add') (1, ?, ?, 32) ('load summary for : ', u'conv4_1/Reshape_1') (1, ?, ?, 2) ('load summary for : ', u'conv4_2/BiasAdd') (1, ?, ?, 4) ('load summary for : ', u'conv4_3/BiasAdd') (1, ?, ?, 10) 2018-10-22 14:56:35.504447: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports ins tructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA data/MTCNN_model/PNet_landmark/PNet-18 restore models' param ('==================================', 'RNet') (256, 24, 24, 3) (256, 22, 22, 28) (256, 11, 11, 28) (256, 9, 9, 48) (256, 4, 4, 48) (256, 3, 3, 64) (256, 576) (256, 128) (256, 2) (256, 4) (256, 10) data/MTCNN_model/RNet_landmark/RNet-14 restore models' param ================================== load test data finish loading start detecting.... 100 out of 12880 images done 0.969146 seconds for each image 200 out of 12880 images done 0.954468 seconds for each image 300 out of 12880 images done 0.880505 seconds for each image 400 out of 12880 images done 。。。 。。。 12800 out of 12880 images done 0.826616 seconds for each image ('num of images', 12880) time cost in average0.839 pnet 0.598 rnet 0.240 onet 0.000 ('boxes length:', 12880) finish detecting save_path is : DATA/no_LM48/ONet 48測試完成開始OHEM processing 12880 images in total 12880 12880 0 images done 100 images done 200 images done 300 images done 400 images done 。。。
生成數據(for Landmark)
與PNet,RNet類似,只是轉換的size變成48,運行結果如下:

調整結果未變,resize大小變為48.

合並數據

轉換數據成tfrecord

訓練
[root@node5 MTCNN-Tensorflow]# python train_models/train_ONet.py ['/ssd/yuansaijie/MTCNN-Tensorflow/train_models', '/ssd/yuansaijie/MTCNN-Tensorflow', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk','/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages/pika-0.9.14-py2.7.egg', '/usr/lib/python2.7/site-packages/elasticsearch-1.4.0-py2.7.egg', '../prepare_data'] DATA/imglists/ONet/train_ONet_landmark.txt ('Total size of the dataset is: ', 1395806) mymodel/MTCNN_model/ONet_landmark/ONet (64, 48, 48, 3) (64, 48, 48, 3) (192, 48, 48, 3) (64, 48, 48, 3) (384, 48, 48, 3) (384, 4) (384, 48, 48, 3) (384, 46, 46, 32) (384, 23, 23, 32) (384, 21, 21, 64) (384, 10, 10, 64) (384, 8, 8, 64) (384, 4, 4, 64) (384, 3, 3, 128) (384, 1152) (384, 256) (384, 2) (384, 4) (384, 10) WARNING:tensorflow:From /ssd/yuansaijie/MTCNN-Tensorflow/train_models/mtcnn_model.py:328: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_regularization_losses instead. 2018-10-23 09:44:37.292322: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-10-23 09:44:44.016103 : Step: 10/79970, accuracy: 0.746094, cls loss: 0.683990, bbox loss: 0.171421,Landmark loss :0.382090,L2 loss: 0.049354, Total Loss: 1.201144 ,lr:0.001000 2018-10-23 09:44:50.052537 : Step: 20/79970, accuracy: 0.750000, cls loss: 0.663642, bbox loss: 0.098265,Landmark loss :0.368318,L2 loss: 0.049314, Total Loss: 1.130407 ,lr:0.001000 ... ... 2018-10-24 06:15:42.631526 : Step: 79970/79970, accuracy: 0.972656, cls loss: 0.115991, bbox loss: 0.059060,Landmark loss :0.017580,L2 loss: 0.043284, Total Loss: 0.206384 ,lr:0.000001 ('path prefix is :', 'mymodel/MTCNN_model/ONet_landmark/ONet-22') # 此處訓練時長已經不對了,因為是半夜重新跑的,大概是花了12h左右吧
