數據集格式


1、xml

使用labelmg工具對圖片進行標注得到xml格式文件,以圖片為例介紹內容信息:

對上面的圖片進行標注后,得到xml文件:

其內容分類兩部分:

  1. 第一個黑色方框,圖像的整體部分,包括圖像的名稱、位置、長寬高等等;
  2. 第二個黑色方框,標注框信息,每個紅色框就是一個object標簽(表示一個標注框)的信息,包括目標類別名稱、位置信息等

xml內的信息是由一個個對象組成,標簽之間存在層級關系,例如annotation為最上層的標簽,就是這個xml所在的文件夾,其他標簽為字標簽。

2、xml -> csv

字符(逗號)分割值。

每個object標簽代表一個標注框,都會在csv文件中生成一條數據,每天數據的屬性為:圖片文件名、寬度、高度、類別、框的左上角x坐標、框的左上角y、框的右上角x、框的右上角y。

xml轉csv的代碼如下:

# -*- coding: utf-8 -*-
"""
將文件夾內所有XML文件的信息記錄到CSV文件中
"""

import os  
import glob  
import pandas as pd  
import xml.etree.ElementTree as ET  

  
def xml_to_csv(path):          #path:annotations的文件夾路徑
    xml_list = []  
    for xml_file in glob.glob(path + '/*.xml'):  #對path目錄下的每一個xml文件
        tree = ET.parse(xml_file)  #獲得xml對應的解析樹
        root = tree.getroot()  #獲得根標簽annotations
        # print(root)  
        print(root.find('filename').text)  
        for member in root.findall('object'):  #對每一個object標簽(框)
            value = (root.find('filename').text,  #在根標簽下查找filename標簽(圖片文件名字),獲得文本信息
                     int(root.find('size')[0].text),  #在根標簽下找size標簽,並獲得第0個字標簽(width)的文本信息,轉化為int
                     int(root.find('size')[1].text),   #在根標簽下找size標簽,並獲得di1個字標簽(height)的文本信息,轉化為int
                     member[0].text,  #獲得object標簽的第0個字標簽name的文信息
                     int(member[4][0].text),  #獲得object的第四個子標簽bndbox,獲得bndbox的第0個字標簽(xmin)的文本信息,轉化為int
                     int(float(member[4][1].text)),  #獲得object的第四個子標簽bndbox,獲得bndbox的第1個字標簽(ymin)的文本信息,轉化為int
                     int(member[4][2].text),  #獲得object的第四個子標簽bndbox,獲得bndbox的第2個字標簽(xmax)的文本信息,轉化為int
                     int(member[4][3].text)  #獲得object的第四個子標簽bndbox,獲得bndbox的第3個字標簽(ymax)的文本信息,轉化為int
                     )  
            xml_list.append(value)  
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']  
    xml_df = pd.DataFrame(xml_list, columns=column_name)  
    return xml_df  
  
def main():  
    for directory in ['train','test','validation']:  #對應train和test文件夾
        #對應根目錄下的/images中的train和test文件夾,本腳本要放在voc文件夾下,和annotations是同級的,否則修改getcwd函數
        xml_path = os.path.join(os.getcwd(), 'annotations/{}'.format(directory))   
        xml_df = xml_to_csv(xml_path)  
        xml_df.to_csv('data/whsyxt_{}_labels.csv'.format(directory), index=None)  #xml轉化為對應的csv保存
        print('Successfully converted xml to csv.')  

main()

對應的xml文件如下圖:

最后得到兩個文件:

文件打開類似於這樣的:

其中的filename只是圖片文件的名字,不包括路徑。

3、xml轉換為tfrecord

 每個圖片會生成一個xml文件,批量的將xml文件轉化成tfrecord格式。

 

4、csv轉換成tfrecord

將多個xml文件寫入到一個csv文件中去,每一行是一個xml文件的信息,接下來直接將這個csv文件轉換成tfrecord格式就可以了,很方便快。

由於圖像和標簽值不在一起,所以要將整張圖片信息和csv文件合並起來,整合成為tfrecord格式寫入到本地中,用於訓練。

代碼來自tensorflow/object_dection/models-master/research/object_detection/test_generate_tfrecord.py:

Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=data/train.record

  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=data/test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
"""
    DEFINE_string定義了個命令行參數

    flage_name:csv_input,參數名字
    defalut_name:默認值 ,這里的默認值是data/test_labels.csv
    docstring:對該參數的說明

    可以使用tf.app.flags.FLAGS取出該參數的值:
    FLAGS = tf.app.flags.FLAGS
    print(FLAGS.csv_input),輸出的就是data/test_labels.csv

"""
flags.DEFINE_string('csv_input', 'data/test_labels.csv', 'Path to the CSV input')
flags.DEFINE_string('output_path', 'data/test.record', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
# 修改成你自己的標簽
def class_text_to_int(row_label):
    if row_label == 'face':
        return 0
    elif row_label == 'cat':
        return 1
    #............
        

def split(df, group):
    """namedtuple工廠函數,返回一個名為`data`的類,並賦值給名為data的變量
    定義:Point = namedtuple('Point', ['x', 'y']) 
    創建對象:p = Point(11, y=22) 
                p[0] + p[1] 輸出 33
    解包:x, y = p
         x,y 輸出:(11, 22)
    訪問:p.x + p.y  輸出 33
        
    """
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]

#讀取每張圖片,得到每張圖片的信息,將每張圖片信息和圖片里的object標注框信息(在csv里)合並在一起
#group
#path:iamge目錄
def create_tf_example(group, path):
    #image目錄 + image的名字 = image的絕對路徑路徑
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))
    #圖像所有信息encoded_jpg和object信息整合一起
    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), 'images/test')    #一個csv文件最后生成一個tfrecord文件
    examples = pd.read_csv(FLAGS.csv_input)//讀csv文件內容,返回pandas對象矩陣
        """
             filename  width  height   class  xmin  ymin  xmax  ymax
        0  000001.jpg    353     500     dog    43   233   205   362
        1  000001.jpg    353     500  person   117    12   296   226
        2  000002.jpg    335     500   train   122   188   220   299
        
        """
    grouped = split(examples, 'filename')
        """
        [
        data(filename='000002.jpg', object=     filename  width  height  class  xmin  ymin  xmax  ymax
                2  000002.jpg    335     500  train   122   188   220   299), 
        #兩個1.jpg是因為這張圖片里面有兩個object        
        data(filename='000001.jpg', object=     filename  width  height   class  xmin  ymin  xmax  ymax
                0  000001.jpg    353     500     dog    43   233   205   362
                1  000001.jpg    353     500  person   117    12   296   226)
        ]
        
        """
    for group in grouped:
        tf_example = create_tf_example(group, path)//將每個圖片的標注信息和圖像信息結合在一起
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

 

同理還有train_generate_tfrecord.py:

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=data/train.record

  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=data/test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', 'data/train_labels.csv', 'Path to the CSV input')
flags.DEFINE_string('output_path', 'data/train.record', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'face':
        return 1
    else:
        0
        

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), 'images/train')
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM