數據轉換實在是個煩人的工作,被折磨了很久決定抽出時間整理一下,僅供參考。
在一個項目中,我需要將已有的VOC的xml標注文件轉化成COCO的數據格式,為了方便理解,文章按如下順序介紹:
- XML文件內容長什么樣
- COCO的數據格式長什么樣
- XML如何轉化成COCO格式
VOC XML長什么樣?
下面我只把重要信息題練出來,如下所示:
<annotation>
<folder>文件夾目錄</folder>
<filename>圖片名.jpg</filename>
<path>path_to\at002eg001.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>550</width>
<height>518</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>Apple</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>292</xmin>
<ymin>218</ymin>
<xmax>410</xmax>
<ymax>331</ymax>
</bndbox>
</object>
<object>
...
</object>
</annotation>
可以看到一個xml文件包含如下信息:
- folder: 文件夾
- filename:文件名
- path:路徑
- source:我項目里沒有用到
- size:圖片大小
- segmented:圖像分割會用到,本文僅以目標檢測(bounding box為例進行介紹)
- object:一個xml文件可以有多個object,每個object表示一個box,每個box有如下信息組成:
- name:改box框出來的object屬於哪一類,例如Apple
- bndbox:給出左上角和右下角的坐標
- truncated:略
- difficult:略
COCO長什么樣?
COCO目錄啥樣?
不同於VOC,一張圖片對應一個xml文件,coco是直接將所有圖片以及對應的box信息寫在了一個json文件里。通常整個coco目錄長這樣:
coco
|______annotations # 存放標注信息
| |__train.json
| |__val.json
| |__test.json
|______trainset # 存放訓練集圖像
|______valset # 存放驗證集圖像
|______testset # 存放測試集圖像
COCO的json文件啥樣?
一個標准的json文件包含如下信息:
{
"info": info,
"images": [image],
"annotations": [annotation],
"licenses": [license],
}
info{
"year": int,
"version": str,
"description": str,
"contributor": str,
"url": str,
"date_created": datetime,
}
image{
"id": int,
"width": int,
"height": int,
"file_name": str,
"license": int,
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}
license{
"id": int,
"name": str,
"url": str,
}
是不是有點抽象?官網就是這樣的,酸爽不酸爽,反正我看官網看的一臉懵。。。可能是還欠點修行
那么json里具體每一個是干嘛用的呢?且let me一一道來。(散裝英語說的好爽)
- info: 這個記錄的是你的數據集信息,例如
"info": { # 數據集信息描述
"description": "COCO 2017 Dataset", # 數據集描述
"url": "http://cocodataset.org", # 下載地址
"version": "1.0", # 版本
"year": 2017, # 年份
"contributor": "COCO Consortium", # 提供者
"date_created": "2017/09/01" # 數據創建日期
} `
- licenses: 記錄的就是license。。。,license可以有多個,因為可能你是從多個渠道獲得的數據,例如
"licenses": [
{
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License"
},
……
……
],
- images:這個其實就是記錄每一張圖片的信息,主要的有 文件名、寬、高、id,其他的可選,如:
"images": [
{
"file_name": "000000397133.jpg", # 圖片名
"id": 397133 # 圖片的ID編號(每張圖片ID是唯一的)
"height": 427, # 高
"width": 640, # 寬
"license": 4,
"coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",# 網路地址路徑
"date_captured": "2013-11-14 17:02:52", # 數據獲取日期
"flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",# flickr網路地址
},
……,
……
]
- categories:這個很好理解,就是你的類別信息。
其中需要注意的是:- 有一個key是“supercategory”,之所以有這個是因為在COCO數據集中有的類別其實是可以歸類為同一類的,例如貓和狗都屬於Animal
- id編號是從1開始的,0默認為背景
示例如下:
"categories": [
{
"supercategory": "person", # 主類別
"id": 1, # 類對應的id (0 默認為背景)
"name": "person" # 子類別
},
{
"supercategory": "Animal",
"id": 2,
"name": "bicycle"
},
{
"supercategory": "vehicle",
"id": 3,
"name": "car"
},
……
……
],
如何將XML轉化為COCO格式
下面直接搬運別人已經寫好的代碼,親測有效。使用注意事項:須先安裝lxml庫,另外你要確保你的xml文件里類別不要出錯,例如我自己的數據集因為有的類別名稱多了個下划線或者其他手賤誤敲的字母,導致這些類別就被當成新的類別了。祝好運。
#!/usr/bin/python
# pip install lxml
import sys
import os
import json
import xml.etree.ElementTree as ET
START_BOUNDING_BOX_ID = 1
PRE_DEFINE_CATEGORIES = {}
# If necessary, pre-define category and its id
# PRE_DEFINE_CATEGORIES = {"aeroplane": 1, "bicycle": 2, "bird": 3, "boat": 4,
# "bottle":5, "bus": 6, "car": 7, "cat": 8, "chair": 9,
# "cow": 10, "diningtable": 11, "dog": 12, "horse": 13,
# "motorbike": 14, "person": 15, "pottedplant": 16,
# "sheep": 17, "sofa": 18, "train": 19, "tvmonitor": 20}
def get(root, name):
vars = root.findall(name)
return vars
def get_and_check(root, name, length):
vars = root.findall(name)
if len(vars) == 0:
raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))
if length > 0 and len(vars) != length:
raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))
if length == 1:
vars = vars[0]
return vars
def get_filename_as_int(filename):
try:
filename = os.path.splitext(filename)[0]
return int(filename)
except:
raise NotImplementedError('Filename %s is supposed to be an integer.'%(filename))
def convert(xml_list, xml_dir, json_file):
list_fp = open(xml_list, 'r')
json_dict = {"images":[], "type": "instances", "annotations": [],
"categories": []}
categories = PRE_DEFINE_CATEGORIES
bnd_id = START_BOUNDING_BOX_ID
for line in list_fp:
line = line.strip()
print("Processing %s"%(line))
xml_f = os.path.join(xml_dir, line)
tree = ET.parse(xml_f)
root = tree.getroot()
path = get(root, 'path')
if len(path) == 1:
filename = os.path.basename(path[0].text)
elif len(path) == 0:
filename = get_and_check(root, 'filename', 1).text
else:
raise NotImplementedError('%d paths found in %s'%(len(path), line))
## The filename must be a number
image_id = get_filename_as_int(filename)
size = get_and_check(root, 'size', 1)
width = int(get_and_check(size, 'width', 1).text)
height = int(get_and_check(size, 'height', 1).text)
image = {'file_name': filename, 'height': height, 'width': width,
'id':image_id}
json_dict['images'].append(image)
## Cruuently we do not support segmentation
# segmented = get_and_check(root, 'segmented', 1).text
# assert segmented == '0'
for obj in get(root, 'object'):
category = get_and_check(obj, 'name', 1).text
if category not in categories:
new_id = len(categories)
categories[category] = new_id
category_id = categories[category]
bndbox = get_and_check(obj, 'bndbox', 1)
xmin = int(get_and_check(bndbox, 'xmin', 1).text) - 1
ymin = int(get_and_check(bndbox, 'ymin', 1).text) - 1
xmax = int(get_and_check(bndbox, 'xmax', 1).text)
ymax = int(get_and_check(bndbox, 'ymax', 1).text)
assert(xmax > xmin)
assert(ymax > ymin)
o_width = abs(xmax - xmin)
o_height = abs(ymax - ymin)
ann = {'area': o_width*o_height, 'iscrowd': 0, 'image_id':
image_id, 'bbox':[xmin, ymin, o_width, o_height],
'category_id': category_id, 'id': bnd_id, 'ignore': 0,
'segmentation': []}
json_dict['annotations'].append(ann)
bnd_id = bnd_id + 1
for cate, cid in categories.items():
cat = {'supercategory': 'none', 'id': cid, 'name': cate}
json_dict['categories'].append(cat)
json_fp = open(json_file, 'w')
json_str = json.dumps(json_dict)
json_fp.write(json_str)
json_fp.close()
list_fp.close()
if __name__ == '__main__':
if len(sys.argv) <= 1:
print('3 auguments are need.')
print('Usage: %s XML_LIST.txt XML_DIR OUTPU_JSON.json'%(sys.argv[0]))
exit(1)
convert(sys.argv[1], sys.argv[2], sys.argv[3])
參考資料
- https://github.com/shiyemin/voc2coco/blob/master/voc2coco.py
- http://cocodataset.org/#format-data
- https://blog.csdn.net/wc781708249/article/details/79603522