目標檢測 – 解析VOC和COCO格式並制作自己的數據集

　　相對其他計算機視覺任務，目標檢測算法的數據格式更為復雜。為了對數據進行統一的處理，目標檢測數據一般都會做成VOC或者COCO的格式。
　　VOC和COCO都是既支持檢測也支持分割的數據格式，本文主要分析PASCAL VOC和COCO數據集中物體識別相關的內容，並學習如何制作自己的數據集。

Contents [hide]

VOC格式

目錄結構

　　VOC格式數據集一般有着如下的目錄結構：

VOC_ROOT #根目錄

├── JPEGImages # 存放源圖片

│ ├── aaaa.jpg

│ ├── bbbb.jpg

│ └── cccc.jpg

├── Annotations # 存放xml文件，與JPEGImages中的圖片一一對應，解釋圖片的內容等等

│ ├── aaaa.xml

│ ├── bbbb.xml

│ └── cccc.xml

└── ImageSets

└── Main

├── train.txt # txt文件中每一行包含一個圖片的名稱

└── val.txt

　　其中JPEGImages目錄中存放的是源圖片的數據，(當然圖片並不一定要是.jpg格式的，只是規定文件夾名字叫JPEGImages)；
　　Annotations目錄中存放的是標注數據，VOC的標注是xml格式的，文件名與JPEGImages中的圖片一一對應；
　　ImageSets/Main目錄中存放的是訓練和驗證時的文件列表，每行一個文件名(不包含擴展名)，例如train.txt是下面這種格式的：

# train.txt

aaaa

bbbb

cccc

XML標注格式

　　xml格式的標注格式如下：

<filename>aaaa.jpg</filename> # 文件名

<size> # 圖像尺寸（長寬以及通道數）

</size>

<segmented>1</segmented> # 是否用於分割（在圖像物體識別中無所謂）

<name>horse</name> # 物體類別

<pose>Unspecified</pose> # 拍攝角度，如果是自己的數據集就Unspecified

<truncated>0</truncated> # 是否被截斷（0表示完整)

<difficult>0</difficult> # 目標是否難以識別（0表示容易識別）

<bndbox> # bounding-box（包含左下角和右上角xy坐標）

</bndbox>

</object>

<name>person</name>

<pose>Unspecified</pose>

</bndbox>

</object>

</annotation>

制作自己的VOC數據集

　　制作自己數據集的步驟為：

　　① 新建一個JPEGImages的文件夾，把所有圖片放到這個目錄。(或者使用ln -s把圖片文件夾軟鏈接到JPEGImages)；

　　② 由原來的數據格式生成xml，其中pose，truncated和difficult沒有指定時使用默認的即可。bounding box的格式是[x1,y1,x2,y2]，即[左上角的坐標, 右下角的坐標]。x是寬方向上的，y是高方向上的。

　　③ 隨機划分訓練集和驗證集，訓練集的文件名列表存放在ImageSets/Main/train.txt，驗證集的文件名列表存放在ImageSets/Main/val.txt。

參考代碼

　　附一個由csv轉voc格式的腳本：

# encoding=utf-8

import os

from collections import defaultdict

import csv

import cv2

import ipdb

import misc_utils as utils # pip3 install utils-misc==0.0.5 -i https://pypi.douban.com/simple/

import json

utils.color_print('建立JPEGImages目錄', 3)

os.makedirs('Annotations', exist_ok=True)

utils.color_print('建立Annotations目錄', 3)

os.makedirs('ImageSets/Main', exist_ok=True)

utils.color_print('建立ImageSets/Main目錄', 3)

files = os.listdir('train')

files.sort()

mem = defaultdict(list)

confirm = input('即將生成annotations，大約需要3-5分鍾，是否開始(y/n)? ')

if confirm.lower() != 'y':

utils.color_print(f'Aborted.', 3)

exit()

with open('train.csv', 'r') as f:

csv_file = csv.reader(f)

'''

讀取的csv_file是一個iterator，每個元素代表一行

'''

for i, line in enumerate(csv_file):

if i == 0:

continue

filename, width, height, bbox, _ = line

x1, y1, x2, y2 = json.loads(bbox)

x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)

x2 += x1

y2 += y1

mem[filename].append([x1, y1, x2, y2])

for i, filename in enumerate(mem):

utils.progress_bar(i, len(mem), 'handling...')

img = cv2.imread(os.path.join('train', filename))

# height, width, _ = img.shape

with open(os.path.join('Annotations', filename.rstrip('.jpg')) + '.xml', 'w') as f:

f.write(f"""<annotation>

<folder>train</folder>

<filename>{filename}.jpg</filename>

<size>

</size>

<segmented>0</segmented>\n""")

for x1, y1, x2, y2 in mem[filename]:

f.write(f""" <object>

<name>wheat</name>

<pose>Unspecified</pose>

</bndbox>

</object>\n""")

f.write("</annotation>")

files = list(mem.keys())

files.sort()

f1 = open('ImageSets/Main/train.txt', 'w')

f2 = open('ImageSets/Main/val.txt', 'w')

train_count = 0

val_count = 0

with open('ImageSets/Main/all.txt', 'w') as f:

for filename in files:

# filename = filename.rstrip('.jpg')

f.writelines(filename + '\n')

if utils.gambling(0.1): # 10%的驗證集

f2.writelines(filename + '\n')

val_count += 1

else:

f1.writelines(filename + '\n')

train_count += 1

f1.close()

f2.close()

utils.color_print(f'隨機划分訓練集: {train_count}張圖，測試集：{val_count}張圖', 3)

COCO格式

目錄結構

　　COCO格式數據集的目錄結構如下：

COCO_ROOT #根目錄

├── annotations # 存放json格式的標注

│ ├── instances_train2017.json

│ └── instances_val2017.json

└── train2017 # 存放圖片文件

│ ├── 000000000001.jpg

│ ├── 000000000002.jpg

│ └── 000000000003.jpg

└── val2017

├── 000000000004.jpg

└── 000000000005.jpg

　　這里的train2017和val2017稱為set_name，annnotations文件夾中的json格式的標注文件名要與之對應並以instances_開頭，也就是instances_{setname}.json。

json標注格式

　　與VOC一個文件一個xml標注不同，COCO所有的目標框標注都是放在一個json文件中的。
這個json文件解析出來是一個字典，格式如下：

{

"info": info,

"images": [image],

"annotations": [annotation],

"categories": [categories],

"licenses": [license],

}

　　制作自己的數據集的時候info和licenses是不需要的。只需要中間的三個字段即可。

　　其中images是一個字典的列表，每個圖片的格式如下：

# json['images'][0]

{

'license': 4,

'file_name': '000000397133.jpg',

'coco_url': 'http://images.cocodataset.org/val2017/000000397133.jpg',

'height': 427,

'width': 640,

'date_captured': '2013-11-14 17:02:52',

'flickr_url': 'http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg',

'id': 397133}

　　自己的數據集只需要寫file_name,height,width和id即可。id是圖片的編號，在annotations中也要用到，每張圖是唯一的。

　　categories表示所有的類別，格式如下：

[

{'supercategory': 'person', 'id': 1, 'name': 'person'},

{'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'},

{'supercategory': 'vehicle', 'id': 3, 'name': 'car'},

{'supercategory': 'vehicle', 'id': 4, 'name': 'motorcycle'},

{'supercategory': 'vehicle', 'id': 5, 'name': 'airplane'},

{'supercategory': 'vehicle', 'id': 6, 'name': 'bus'},

{'supercategory': 'vehicle', 'id': 7, 'name': 'train'},

{'supercategory': 'vehicle', 'id': 8, 'name': 'truck'},

{'supercategory': 'vehicle', 'id': 9, 'name': 'boat'}

# ....

]

　　annotations是檢測框的標注，一個bounding box的格式如下：

{'segmentation': [[0, 0, 60, 0, 60, 40, 0, 40]],

'area': 240.000,

'iscrowd': 0,

'image_id': 289343,

'bbox': [0., 0., 60., 40.],

'category_id': 18,

'id': 1768}

　　其中segmentation是分割的多邊形，如果不知道直接填寫[[x1, y1, x2, y1, x2, y2, x1, y2]]就可以了，area是分割的面積，bbox是檢測框的[x, y, w, h]坐標，category_id是類別id，與categories中對應,image_id圖像的id，id是bbox的id，每個檢測框是唯一的。

參考代碼

　　附一個VOC轉COCO格式的參考代碼

voc_dataset = VOCTrainValDataset(voc_root,

class_names,

split=train_split,

format=img_format,

transforms=preview_transform)

output_file = f'instances_{train_split[:-4]}.json'

for i, sample in enumerate(voc_dataset):

utils.progress_bar(i, len(voc_dataset), 'Drawing...')

image = sample['image']

bboxes = sample['bboxes'].cpu().numpy()

labels = sample['labels'].cpu().numpy()

image_path = sample['path']

h, w, _ = image.shape

global_image_id += 1

coco_dataset['images'].append({

'file_name': os.path.basename(image_path),

'id': global_image_id,

'width': int(w),

'height': int(h)

})

for j in range(len(labels)):

x1, y1, x2, y2 = bboxes[j]

x1, y1, x2, y2 = float(x1), float(y1), float(x2), float(y2),

category_id = int(labels[j].item()) + 1

# label_name = class_names[label]

width = max(0, x2 - x1)

height = max(0, y2 - y1)

area = width * height

global_annotation_id += 1

coco_dataset['annotations'].append({

'id': global_annotation_id,

'image_id': global_image_id,

'category_id': category_id,

'segmentation': [[x1, y1, x2, y1, x2, y2, x1, y2]],

'area': float(area),

'iscrowd': 0,

'bbox': [x1, y1, width, height],

})

with open(output_file, 'w', encoding='utf-8') as f:

json.dump(coco_dataset, f, ensure_ascii=False)

print(f'Done. coco json file has been saved to `{output_file}`')

參考鏈接

https://cocodataset.org/#format-data