1、xml
使用labelmg工具對圖片進行標注得到xml格式文件,以圖片為例介紹內容信息:
對上面的圖片進行標注后,得到xml文件:
其內容分類兩部分:
- 第一個黑色方框,圖像的整體部分,包括圖像的名稱、位置、長寬高等等;
- 第二個黑色方框,標注框信息,每個紅色框就是一個object標簽(表示一個標注框)的信息,包括目標類別名稱、位置信息等
xml內的信息是由一個個對象組成,標簽之間存在層級關系,例如annotation為最上層的標簽,就是這個xml所在的文件夾,其他標簽為字標簽。
2、xml -> csv
字符(逗號)分割值。
每個object標簽代表一個標注框,都會在csv文件中生成一條數據,每天數據的屬性為:圖片文件名、寬度、高度、類別、框的左上角x坐標、框的左上角y、框的右上角x、框的右上角y。
xml轉csv的代碼如下:
# -*- coding: utf-8 -*- """ 將文件夾內所有XML文件的信息記錄到CSV文件中 """ import os import glob import pandas as pd import xml.etree.ElementTree as ET def xml_to_csv(path): #path:annotations的文件夾路徑 xml_list = [] for xml_file in glob.glob(path + '/*.xml'): #對path目錄下的每一個xml文件 tree = ET.parse(xml_file) #獲得xml對應的解析樹 root = tree.getroot() #獲得根標簽annotations # print(root) print(root.find('filename').text) for member in root.findall('object'): #對每一個object標簽(框) value = (root.find('filename').text, #在根標簽下查找filename標簽(圖片文件名字),獲得文本信息 int(root.find('size')[0].text), #在根標簽下找size標簽,並獲得第0個字標簽(width)的文本信息,轉化為int int(root.find('size')[1].text), #在根標簽下找size標簽,並獲得di1個字標簽(height)的文本信息,轉化為int member[0].text, #獲得object標簽的第0個字標簽name的文信息 int(member[4][0].text), #獲得object的第四個子標簽bndbox,獲得bndbox的第0個字標簽(xmin)的文本信息,轉化為int int(float(member[4][1].text)), #獲得object的第四個子標簽bndbox,獲得bndbox的第1個字標簽(ymin)的文本信息,轉化為int int(member[4][2].text), #獲得object的第四個子標簽bndbox,獲得bndbox的第2個字標簽(xmax)的文本信息,轉化為int int(member[4][3].text) #獲得object的第四個子標簽bndbox,獲得bndbox的第3個字標簽(ymax)的文本信息,轉化為int ) xml_list.append(value) column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_df def main(): for directory in ['train','test','validation']: #對應train和test文件夾 #對應根目錄下的/images中的train和test文件夾,本腳本要放在voc文件夾下,和annotations是同級的,否則修改getcwd函數 xml_path = os.path.join(os.getcwd(), 'annotations/{}'.format(directory)) xml_df = xml_to_csv(xml_path) xml_df.to_csv('data/whsyxt_{}_labels.csv'.format(directory), index=None) #xml轉化為對應的csv保存 print('Successfully converted xml to csv.') main()
對應的xml文件如下圖:
最后得到兩個文件:
文件打開類似於這樣的:
其中的filename只是圖片文件的名字,不包括路徑。
3、xml轉換為tfrecord
每個圖片會生成一個xml文件,批量的將xml文件轉化成tfrecord格式。
4、csv轉換成tfrecord
將多個xml文件寫入到一個csv文件中去,每一行是一個xml文件的信息,接下來直接將這個csv文件轉換成tfrecord格式就可以了,很方便快。
由於圖像和標簽值不在一起,所以要將整張圖片信息和csv文件合並起來,整合成為tfrecord格式寫入到本地中,用於訓練。
代碼來自tensorflow/object_dection/models-master/research/object_detection/test_generate_tfrecord.py:
Usage: # From tensorflow/models/ # Create train data: python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record # Create test data: python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record """ from __future__ import division from __future__ import print_function from __future__ import absolute_import import os import io import pandas as pd import tensorflow as tf from PIL import Image from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict flags = tf.app.flags """ DEFINE_string定義了個命令行參數 flage_name:csv_input,參數名字 defalut_name:默認值 ,這里的默認值是data/test_labels.csv docstring:對該參數的說明 可以使用tf.app.flags.FLAGS取出該參數的值: FLAGS = tf.app.flags.FLAGS print(FLAGS.csv_input),輸出的就是data/test_labels.csv """ flags.DEFINE_string('csv_input', 'data/test_labels.csv', 'Path to the CSV input') flags.DEFINE_string('output_path', 'data/test.record', 'Path to output TFRecord') FLAGS = flags.FLAGS # TO-DO replace this with label map # 修改成你自己的標簽 def class_text_to_int(row_label): if row_label == 'face': return 0 elif row_label == 'cat': return 1 #............ def split(df, group): """namedtuple工廠函數,返回一個名為`data`的類,並賦值給名為data的變量 定義:Point = namedtuple('Point', ['x', 'y']) 創建對象:p = Point(11, y=22) p[0] + p[1] 輸出 33 解包:x, y = p x,y 輸出:(11, 22) 訪問:p.x + p.y 輸出 33 """ data = namedtuple('data', ['filename', 'object']) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] #讀取每張圖片,得到每張圖片的信息,將每張圖片信息和圖片里的object標注框信息(在csv里)合並在一起 #group #path:iamge目錄 def create_tf_example(group, path): #image目錄 + image的名字 = image的絕對路徑路徑 with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode('utf8') image_format = b'jpg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row['xmin'] / width) xmaxs.append(row['xmax'] / width) ymins.append(row['ymin'] / height) ymaxs.append(row['ymax'] / height) classes_text.append(row['class'].encode('utf8')) classes.append(class_text_to_int(row['class'])) #圖像所有信息encoded_jpg和object信息整合一起 tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example def main(_): writer = tf.python_io.TFRecordWriter(FLAGS.output_path) path = os.path.join(os.getcwd(), 'images/test') #一個csv文件最后生成一個tfrecord文件 examples = pd.read_csv(FLAGS.csv_input)//讀csv文件內容,返回pandas對象矩陣 """ filename width height class xmin ymin xmax ymax 0 000001.jpg 353 500 dog 43 233 205 362 1 000001.jpg 353 500 person 117 12 296 226 2 000002.jpg 335 500 train 122 188 220 299 """ grouped = split(examples, 'filename') """ [ data(filename='000002.jpg', object= filename width height class xmin ymin xmax ymax 2 000002.jpg 335 500 train 122 188 220 299), #兩個1.jpg是因為這張圖片里面有兩個object data(filename='000001.jpg', object= filename width height class xmin ymin xmax ymax 0 000001.jpg 353 500 dog 43 233 205 362 1 000001.jpg 353 500 person 117 12 296 226) ] """ for group in grouped: tf_example = create_tf_example(group, path)//將每個圖片的標注信息和圖像信息結合在一起 writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print('Successfully created the TFRecords: {}'.format(output_path)) if __name__ == '__main__': tf.app.run()
同理還有train_generate_tfrecord.py:
""" Usage: # From tensorflow/models/ # Create train data: python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record # Create test data: python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record """ from __future__ import division from __future__ import print_function from __future__ import absolute_import import os import io import pandas as pd import tensorflow as tf from PIL import Image from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict flags = tf.app.flags flags.DEFINE_string('csv_input', 'data/train_labels.csv', 'Path to the CSV input') flags.DEFINE_string('output_path', 'data/train.record', 'Path to output TFRecord') FLAGS = flags.FLAGS # TO-DO replace this with label map def class_text_to_int(row_label): if row_label == 'face': return 1 else: 0 def split(df, group): data = namedtuple('data', ['filename', 'object']) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] def create_tf_example(group, path): with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode('utf8') image_format = b'jpg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row['xmin'] / width) xmaxs.append(row['xmax'] / width) ymins.append(row['ymin'] / height) ymaxs.append(row['ymax'] / height) classes_text.append(row['class'].encode('utf8')) classes.append(class_text_to_int(row['class'])) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example def main(_): writer = tf.python_io.TFRecordWriter(FLAGS.output_path) path = os.path.join(os.getcwd(), 'images/train') examples = pd.read_csv(FLAGS.csv_input) grouped = split(examples, 'filename') for group in grouped: tf_example = create_tf_example(group, path) writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print('Successfully created the TFRecords: {}'.format(output_path)) if __name__ == '__main__': tf.app.run()