利用TensorFlow object_detection API 訓練自己的檢測器


參考

https://www.skcript.com/svr/realtime-object-and-face-detection-in-android-using-tensorflow-object-detection-api/

https://www.cnblogs.com/zongfa/p/9663649.html

下載Tensorflow models

由於現在tensorflow已經升級到2.0, https://github.com/tensorflow/models 下面的很多項目都已經升級到了tensorflow2.0, 可是由於公司的機器學習平台的tensorflow還是1.12,直接運行基於tensorflow2.0編寫的代碼,會出現各種各樣的問題,所以需要尋找老版本的models。 直接在網頁上很難找到,所有release的版本都已經去掉了Research/tutorial/sample 這幾個文件夾。最終通過如下命令,成功下載到了想要的models,我們這里主要用到research下面的object_detection項目

git clone -b r1.5 https://github.com/tensorflow/models.git

環境

安裝anaconda和tensorflow這些都是比較容易找到的,由於公司已經預裝了這些,這里就不再累述了。

數據收集與標注

首先我找了一些自己需要用到的圖片,接下來使用 LabelImg 這款小軟件,直接在 https://tzutalin.github.io/labelImg/ (要FQ)  找到它的windows安裝版本,對train和test里的圖片進行人工標注(時間充裕的話越多越好),如下圖所示。

 

 

 標注完成后保存為同名的xml文件,並存在原圖片所在的文件夾中。這樣訓練集就做好了。

轉換成 TFRecords Format 格式

由於object_detection需要的輸入格式是TFRecord, 所以我們必須把原圖.jpg 和 LabelImg生成的.xml 先轉換成 .csv 最后再轉成 .record。

具體需要通過兩個腳本:

# xml2csv.py

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

os.chdir('/home/zzf/tensorflow/models/research/object_detection/images/test')
path = '/home/zzf/tensorflow/models/research/object_detection/images/test'

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    image_path = path
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv('zhangjn_train.csv', index=None)
    print('Successfully converted xml to csv.')


main()
# generate_tfrecord.py

# -*- coding: utf-8 -*-


"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv  --output_path=train.record
  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""


import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

os.chdir('/home/zzf/tensorflow/models/research/object_detection')

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'ZhangJN':     # 需改動,多個寫多個elif
        return 1
    else:
        None


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), 'images/test')         #  需改動
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

對於xml2csv.py,注意改變8,9行,os.chdir和path路徑,以及35行,最后生成的csv文件的命名。generate_tfrecord.py也一樣,路徑需改為自己的,注意33行后的標簽識別代碼中改為相應的標簽,我這里就一個。

對於訓練集與測試集分別運行上述代碼即可,得到train.record與test.record文件。

各文件所在目錄

為了方便,我把image下的train和test的csv和record文件都放到object_detection/data目錄下,如此,在object_dection文件夾下,我們有如下的文件結構:

Object-Detection


-data/
--test_labels.csv

--test.record

--train_labels.csv

--train.record


-images/
--test/
---testingimages.jpg
--train/
---testingimages.jpg
--...yourimages.jpg


-training/   # 新建,用於一會訓練模型使用

配置文件與預訓練模型

接下來需要設置配置文件,在objec_detection/samples下,尋找需要的對於模型的config文件,

我們還可以在官方提供的model zoo里下載訓練好的模型。我們使用ssd_mobilenet_v1_coco,先下載它。當然也可以在configs下面找到對應的config文件ssd_mobilenet_v1_coco.config

在 object_dection文件夾下,解壓 ssd_mobilenet_v1_coco_2017_11_17.tar.gz,

將ssd_mobilenet_v1_coco.config 放在training 文件夾下,然后創建另一個文件object_label.pbtxt,用來定義類別的標簽。

item
{
  id:1
  name: 'ZhangJN'
}
#如果還有其他類別,加入下面
item
{
  id:2
  name: 'XXX' 
}

ssd_mobilenet_v1_coco.config需要修改下面幾處

sd 
{
num_classes: 1     #類別按實際情況, 一個就寫1
box_coder {
  faster_rcnn_box_coder {
    y_scale: 10.0
    x_scale: 10.0
    height_scale: 5.0
    width_scale: 5.0
  }
}
}
train_config: {
  batch_size: 15 #Change the Batch size
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.001
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"  #這個地方按照解壓后的實際文件夾名稱來修改
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 300   #Number of steps to train
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}
train_input_reader: {
  tf_record_input_reader {
    input_path: "data/train.record" #path of our train record
  }
  label_map_path: "training/object_detection.pbtxt"  #標簽文件的位置
}

eval_config: {
  num_examples: 2000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/test.record" #path of our test record
  }
  label_map_path: "training/object_detection.pbtxt"  #標簽文件的位置
  shuffle: false
  num_readers: 1
}

所有數據都已准備好。可以開始訓練了。

訓練自己的模型

在research目錄下執行

$ python setup.py install

現在的目錄結構是

-object_detection
   --training
--ssd_mobilenet_v1_coco.config
--object_label.pbtxt
--data --train.record
--test.record
--ssd_mobilenet_v1_coco_11_06_2017

根據自己的tensorflow環境執行下面的命令

  $ cd "PATH TO THE MODELS FOLDER"  
  $ sudo apt-get install protobuf-compiler python-pil python-lxml
  $ sudo pip install pillow 
  $ sudo pip install lxml
  $ sudo pip install jupyter
  $ sudo pip install matplotlib
  $ protoc object_detection/protos/*.proto --python_out=.
  $ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim 

訓練模型

 $ python train.py --logtostderr --train_dir=training/  
  --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

遇到很討厭的錯誤

ValueError: axis = 0 not in [0, 0)

最后修改ssd_mobilenet_v1_coco.config文件

loss {
  classification_loss {
    weighted_sigmoid {
    anchorwise_output: true  #add this
    }
  }
  localization_loss {
    weighted_smooth_l1 {
    anchorwise_output: true  #add this
    }
  }
  hard_example_miner {
    num_hard_examples: 3000
    iou_threshold: 0.99
    loss_type: CLASSIFICATION
    max_negatives_per_positive: 3
    min_negatives_per_image: 0
  }
  classification_weight: 1.0
  localization_weight: 1.0
}

模型正常訓練了。

保存自己訓練的模型

python3 export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v1_coco.config --trained_checkpoint_prefx training/model.ckpt-3737 --output_directory zhangjn_detction 

其中,trained_checkpoint_prefx要改為自己訓練到的數字, output為想要將模型存放在何處,我這里新建了一個文件夾zhangjn_detction 。運行結束后,就可以在zhangjn_detction文件夾下看到若干文件,有saved_model、checkpoint、frozen_inference_graph.pb等。

測試模型

將object_detection目錄下的object_detection_tutorial.ipynb打開,或者轉成object_detection_tutorial.py的python文件(網上可以找到轉換方法),更改一下就可以測試。

# coding: utf-8

# # Object Detection Demo
# Welcome to the object detection inference walkthrough!  This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image. Make sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start.


from distutils.version import StrictVersion
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops

# if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
#   raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')


# ## Env setup

# In[2]:


# This is needed to display the images.
# get_ipython().magic(u'matplotlib inline')


# ## Object detection imports
# Here are the imports from the object detection module.



from utils import label_map_util

from utils import visualization_utils as vis_util


# # Model preparation 

# ## Variables
# 
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.  
# 
# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.

# In[4]:


# What model to download.
MODEL_NAME = 'zhangjn_detction'
# MODEL_FILE = MODEL_NAME + '.tar.gz'
# DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'object_label.pbtxt')

NUM_CLASSES = 1


# ## Download Model



# opener = urllib.request.URLopener()
# opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
'''
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())
'''

# ## Load a (frozen) Tensorflow model into memory.



detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')


# ## Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine



label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)


# ## Helper code

# In[8]:


def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)


# # Detection



# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)


# In[10]:


def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict


# In[ ]:


for image_path in TEST_IMAGE_PATHS:
  save_path = image_path.split(".")[0]+'_result.jpg' image
= Image.open(image_path) # the array based representation of the image will be used later in order to prepare the # result image with boxes and labels on it. image_np = load_image_into_numpy_array(image) # Expand dimensions since the model expects images to have shape: [1, None, None, 3] image_np_expanded = np.expand_dims(image_np, axis=0) # Actual detection. output_dict = run_inference_for_single_image(image_np, detection_graph) # Visualization of the results of a detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) # plt.figure(figsize=IMAGE_SIZE) # plt.imshow(image_np) # plt.show()
plt.imsave(save_path, image_np)

1、因為不用下載模型,下載相關代碼可以刪除,model name, path to labels , num classes 更改成自己的,download model部分都刪去。

2、測試圖片,准備幾張放入test images文件夾中,命名images+數字.jpg的格式,就不用改代碼,在

TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]

一行更改自己圖片的數字序列就好了,range(3,8),我的圖片命名從3至7.

由於運行在雲服務器上,沒辦法顯示圖片,所以需要保存圖片。最后加入

plt.imsave(save_path, image_np)

運行上面的腳本

python3 object_detection_tutorial.py

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM