前一篇講述了TensorFlow object detection API的安裝與配置,現在我們嘗試用這個API搭建自己的目標檢測模型。
一、准備數據集
本篇旨在人臉識別,在百度圖片上下載了120張張鈞甯的圖片,存放在/models/research/object_detection下新建的images文件夾內,images文件夾下新建train和test兩個文件夾,然后將120分為100和20張分別存放在train和test中。
接下來使用 LabelImg 這款小軟件,安裝方法參考這里,對train和test里的圖片進行人工標注(時間充裕的話越多越好),如下圖所示。
標注完成后保存為同名的xml文件,並存在原文件夾中。
對於Tensorflow,需要輸入專門的 TFRecords Format 格式。
寫兩個小python腳本文件,第一個將文件夾內的xml文件內的信息統一記錄到.csv表格中,第二個從.csv表格中創建tfrecord格式。
附上對應代碼:

# xml2csv.py
import os import glob import pandas as pd import xml.etree.ElementTree as ET os.chdir('/home/zzf/tensorflow/models/research/object_detection/images/test') path = '/home/zzf/tensorflow/models/research/object_detection/images/test'
def xml_to_csv(path): xml_list = [] for xml_file in glob.glob(path + '/*.xml'): tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall('object'): value = (root.find('filename').text, int(root.find('size')[0].text), int(root.find('size')[1].text), member[0].text, int(member[4][0].text), int(member[4][1].text), int(member[4][2].text), int(member[4][3].text) ) xml_list.append(value) column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_df def main(): image_path = path xml_df = xml_to_csv(image_path) xml_df.to_csv('zhangjn_train.csv', index=None) print('Successfully converted xml to csv.') main()

# generate_tfrecord.py
# -*- coding: utf-8 -*-
""" Usage: # From tensorflow/models/ # Create train data: python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv --output_path=train.record # Create test data: python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=test.record """
import os import io import pandas as pd import tensorflow as tf from PIL import Image from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict os.chdir('/home/zzf/tensorflow/models/research/object_detection') flags = tf.app.flags flags.DEFINE_string('csv_input', '', 'Path to the CSV input') flags.DEFINE_string('output_path', '', 'Path to output TFRecord') FLAGS = flags.FLAGS # TO-DO replace this with label map
def class_text_to_int(row_label): if row_label == 'ZhangJN': # 需改動
return 1
else: None def split(df, group): data = namedtuple('data', ['filename', 'object']) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] def create_tf_example(group, path): with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode('utf8') image_format = b'jpg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row['xmin'] / width) xmaxs.append(row['xmax'] / width) ymins.append(row['ymin'] / height) ymaxs.append(row['ymax'] / height) classes_text.append(row['class'].encode('utf8')) classes.append(class_text_to_int(row['class'])) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example def main(_): writer = tf.python_io.TFRecordWriter(FLAGS.output_path) path = os.path.join(os.getcwd(), 'images/test') # 需改動
examples = pd.read_csv(FLAGS.csv_input) grouped = split(examples, 'filename') for group in grouped: tf_example = create_tf_example(group, path) writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print('Successfully created the TFRecords: {}'.format(output_path)) if __name__ == '__main__': tf.app.run()
對於xml2csv.py,注意改變8,9行,os.chdir和path路徑,以及35行,最后生成的csv文件的命名。generate_tfrecord.py也一樣,路徑需改為自己的,注意33行后的標簽識別代碼中改為相應的標簽,我這里就一個。
對於訓練集與測試集分別運行上述代碼即可,得到train.record與test.record文件。
二、配置文件和模型
為了方便,我把image下的train和test的csv和record文件都放到object_detection/data目錄下,如此,在object_dection文件夾下,我們有如下的文件結構:

Object-Detection -data/
--test_labels.csv --test.record --train_labels.csv --train.record -images/
--test/
---testingimages.jpg --train/
---testingimages.jpg --...yourimages.jpg -training/ # 新建,用於一會訓練模型使用
接下來需要設置配置文件,在objec_detection/samples下,尋找需要的對於模型的config文件,
我們還可以在官方提供的model zoo里下載訓練好的模型。我們使用ssd_mobilenet_v1_coco,先下載它。
在 object_dection文件夾下,解壓 ssd_mobilenet_v1_coco_2017_11_17.tar.gz,
將ssd_mobilenet_v1_coco.config 放在training 文件夾下,用文本編輯器打開(我用的sublime 3),進行如下更改:
1、搜索其中的 PATH_TO_BE_CONFIGURED ,將對應的路徑改為自己的路徑,注意不要把test跟train弄反了;
注意最后train input reader和evaluation input reader中label_map_path必須保持一致。
2、將 num_classes 按照實際情況更改,我的例子中是1;
3、batch_size 原本是24,我在運行的時候出現顯存不足的問題,為了保險起見,改為1,如果1還是出現類似問題的話,建議換電腦……
4、fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true
這里是使用finetune,在它原來訓練好的模型數據上進行訓練,這樣可以快很多。不然從頭訓練好慢。
此時在對應目錄(/data)下,創建一個 zhangjn.pbtxt的文本文件(可以復制一個其他名字的文件,然后用文本編輯軟件打開修改),寫入我們的標簽,我的例子中是兩個,id序號注意與前面創建CSV文件時保持一致,從1開始。
item { id: 1 name: 'ZhangJN' }
好,所有數據都已准備好。可以開始訓練了。
三、訓練模型
我在本地GPU訓練(本機環境:Ubuntu 16.04LTS),終端進入 object_detection目錄下,最新版用model_main.py,也可以用老版本的train.py訓練,后面會講到。model_main.py訓練時散熱器風扇已經呼呼轉動了,但終端沒有step the loss 信息輸出,心有點慌,需要先改幾個地方,
- 添加 tf.logging.set_verbosity(tf.logging.INFO) 到model_main.py 的 import 區域之后,會每隔一百個step輸出loss,總比沒有好,至少它讓你知道它在跑。
- 如果是python3訓練,添加
list()
到 model_lib.py的大概390行category_index.values()變成:
list(category_index.values()),否則會有 can't pickle dict_values ERROR出現
- 還有一個問題是,用model_main.py 訓練時,因為它把老版本的train.py和eval.py集合到了一起,所以制定eval num時指定不好會有warning出現,就像:
WARNING:tensorflow:Ignoring ground truth with image id 558212937 since it was previously added
所以在config文件設置時,eval部分的 num_examples (如下)和 運行設置參數--num_eval_steps 時任何一個值只要比你數據集中訓練圖片數目要大就會出現警告,因為它沒那么多圖片來評估,所以這兩個值直接設置成訓練圖片數量就好了。
eval_config: { num_examples: 20 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. max_evals: 10 }
然后在終端輸入:
python3 model_main.py \ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config \ --model_dir=training \ --num_train_steps=60000 \ --num_eval_steps=20 \ --alsologtostderr
正常的話,稍等片刻,聽到風扇開始加速轉動的聲音時,訓練正在有條不紊地進行。model_main.py最后還生成了一個export文件夾,里面居然把save_model.pb都生成了,我沒試過這個是不是我們后面要用的。有興趣的可以試試這個pb文件。
不想改的話可以用老版本的train.py,在legacy/train.py,同樣運行:
python3 legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config
就開始訓練了
另開一個終端,同樣進入到object_detection目錄下,輸入:
tensorboard --logdir=training
此時,我們可以在瀏覽器打開查看訓練進度,它會不停地傳遞新訓練的數據進來。
運行一段時間后,我們可以看到我們的training文件夾下已經有模型數據保存了,接下來就可以生成我們的需要的模型文件了,終端在object_detection目錄下,輸入:
python3 export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix training/model.ckpt-3737 --output_directory zhangjn_detction
其中,trained checkpoint 要改為自己訓練到的數字, output為想要將模型存放在何處,我這里新建了一個文件夾zhangjn_detction 。運行結束后,就可以在zhangjn_detction文件夾下看到若干文件,有saved_model、checkpoint、frozen_inference_graph.pb等。 .pb結尾的就是最重要的frozen model了,上一篇小demo里用的就是它,接下來我們測試就是要用到它。
四、測試模型
將object_detection目錄下的object_detection_tutorial.ipynb打開,或者轉成object_detection_tutorial.py的python文件,更改一下就可以測試。

# coding: utf-8 # # Object Detection Demo # Welcome to the object detection inference walkthrough! This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image. Make sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start. from distutils.version import StrictVersion import numpy as np import os import six.moves.urllib as urllib import sys import tarfile import tensorflow as tf import zipfile from collections import defaultdict from io import StringIO from matplotlib import pyplot as plt from PIL import Image # This is needed since the notebook is stored in the object_detection folder. sys.path.append("..") from object_detection.utils import ops as utils_ops # if StrictVersion(tf.__version__) < StrictVersion('1.9.0'): # raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!') # ## Env setup # In[2]: # This is needed to display the images. # get_ipython().magic(u'matplotlib inline') # ## Object detection imports # Here are the imports from the object detection module. from utils import label_map_util from utils import visualization_utils as vis_util # # Model preparation # ## Variables # # Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file. # # By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies. # In[4]: # What model to download. MODEL_NAME = 'zhangjn_detction' # MODEL_FILE = MODEL_NAME + '.tar.gz' # DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/' # Path to frozen detection graph. This is the actual model that is used for the object detection. PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb' # List of the strings that is used to add correct label for each box. PATH_TO_LABELS = os.path.join('data', 'zhangjn.pbtxt') NUM_CLASSES = 1 # ## Download Model # opener = urllib.request.URLopener() # opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE) ''' tar_file = tarfile.open(MODEL_FILE) for file in tar_file.getmembers(): file_name = os.path.basename(file.name) if 'frozen_inference_graph.pb' in file_name: tar_file.extract(file, os.getcwd()) ''' # ## Load a (frozen) Tensorflow model into memory. detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') # ## Loading label map # Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine label_map = label_map_util.load_labelmap(PATH_TO_LABELS) categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True) category_index = label_map_util.create_category_index(categories) # ## Helper code # In[8]: def load_image_into_numpy_array(image): (im_width, im_height) = image.size return np.array(image.getdata()).reshape( (im_height, im_width, 3)).astype(np.uint8) # # Detection # For the sake of simplicity we will use only 2 images: # image1.jpg # image2.jpg # If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS. PATH_TO_TEST_IMAGES_DIR = 'test_images' TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ] # Size, in inches, of the output images. IMAGE_SIZE = (12, 8) # In[10]: def run_inference_for_single_image(image, graph): with graph.as_default(): with tf.Session() as sess: # Get handles to input and output tensors ops = tf.get_default_graph().get_operations() all_tensor_names = {output.name for op in ops for output in op.outputs} tensor_dict = {} for key in [ 'num_detections', 'detection_boxes', 'detection_scores', 'detection_classes', 'detection_masks' ]: tensor_name = key + ':0' if tensor_name in all_tensor_names: tensor_dict[key] = tf.get_default_graph().get_tensor_by_name( tensor_name) if 'detection_masks' in tensor_dict: # The following processing is only for single image detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0]) detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0]) # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size. real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32) detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1]) detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1]) detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks( detection_masks, detection_boxes, image.shape[0], image.shape[1]) detection_masks_reframed = tf.cast( tf.greater(detection_masks_reframed, 0.5), tf.uint8) # Follow the convention by adding back the batch dimension tensor_dict['detection_masks'] = tf.expand_dims( detection_masks_reframed, 0) image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0') # Run inference output_dict = sess.run(tensor_dict, feed_dict={image_tensor: np.expand_dims(image, 0)}) # all outputs are float32 numpy arrays, so convert types as appropriate output_dict['num_detections'] = int(output_dict['num_detections'][0]) output_dict['detection_classes'] = output_dict[ 'detection_classes'][0].astype(np.uint8) output_dict['detection_boxes'] = output_dict['detection_boxes'][0] output_dict['detection_scores'] = output_dict['detection_scores'][0] if 'detection_masks' in output_dict: output_dict['detection_masks'] = output_dict['detection_masks'][0] return output_dict # In[ ]: for image_path in TEST_IMAGE_PATHS: image = Image.open(image_path) # the array based representation of the image will be used later in order to prepare the # result image with boxes and labels on it. image_np = load_image_into_numpy_array(image) # Expand dimensions since the model expects images to have shape: [1, None, None, 3] image_np_expanded = np.expand_dims(image_np, axis=0) # Actual detection. output_dict = run_inference_for_single_image(image_np, detection_graph) # Visualization of the results of a detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) plt.figure(figsize=IMAGE_SIZE) plt.imshow(image_np) plt.show()
1、因為不用下載模型,下載相關代碼可以刪除,model name, path to labels , num classes 更改成自己的,download model部分都刪去。
2、測試圖片,准備幾張放入test images文件夾中,命名images+數字.jpg的格式,就不用改代碼,在
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]
一行更改自己圖片的數字序列就好了,range(3,8),我的圖片命名從3至7.
如果用python文件的話,最后圖片顯示要加一句
plt.show()
運行它就可以了。
python3 object_detection_tutorial.py
總之,整個訓練過程就是這樣。熟悉了之后也還挺簡單的。運行中可能會碰到這樣那樣的問題,很多是版本問題導致的。TensorFlow最煩人的一點就是版本更新太快,並且改動大,前后版本有些還不兼容。所以有問題不用怕,多Google,百度一下,一般都可以找到答案,如果是版本問題,一時沒法升級的話,對比一下你的版本和最新版本哪個差異導致的,把代碼中方法調用方式改成你的版本就好了,我原來用1.4版本的時候,經常遇到版本不同的問題,比如最新版本中tf.contrib.data.parallel_interleave()方法,在1.4版本中tf.contrib.data沒有parallel_interleave()這個方法;再比如1.10版本中tf.keras.Model()類也可這這樣調用tf.keras.models.Model(),但是在1.4版本中只有后者一種調用方法,若是某個程序用了前者方法,在1.4版本中要運行起來就得自己去改一下了,等等。不過用了一段時間后我還是把TensorFlow升級到1.10了,改太多了,自己都受不了。升級一下就是麻煩點,NVIDIA 驅動,cuda,cudnn 都得改。還好這次輕車熟路,三四個小時就升級成功了。