xml轉voc數據集（含分享數據集）

本文轉載自查看原文 2021-10-31 22:20 110

原始圖片和.xml數據目錄結構如下：

. 
└── data
    ├── 003002_0.jpg
    ├── 003002_0.xml
    ├── 003002_1.jpg
    ├── 003002_1.xml
    ├── 003008_1.jpg
    ├── 003008_1.xml
    └── .......
└── xml2voc2007.py

data目錄下就是你的數據集原始圖片，加上標注的.xml文件。
xml2voc2007.py源碼放到這篇文章的最后邊。

在labelme2coco.py文件的目錄下，打開命令行執行：

python xml2voc2007.py --input_dir data --output_dir VOCdevkit

--input_dir：指定data文件夾，默認輸入為xml2voc2007.py同級目錄下的data文件夾。
--output_dir：指定你的輸出文件夾，默認輸出為xml2voc2007.py同級目錄下的VOCdevkit文件夾（沒有的話就會創建）。

執行結果如下圖：

結果圖片

生成的voc數據集目錄結構如下：

 .
└── VOCdevkit
    └── VOC2007
        ├── Annotations
        │   ├── 003002_0.xml
        │   ├── 003002_1.xml
        │   ├── 003008_1.xml
        │   └── .......
        ├── ImageSets
        │   └── Main
        │       ├── test.txt
        │       ├── train.txt
        │       ├── trainval.txt
        │       └── val.txt
        └── JPEGImages
            ├── 003002_0.jpg
            ├── 003002_1.jpg
            ├── 003008_1.jpg
            └──.......

如果想調整訓練集驗證集的比例，可以在labelme2coco.py源碼中搜索 percent_trainval （訓練集和驗證集在總數中的占比），percent_train，（訓練集在percent_trainval中的占比）

xml2voc2007.py源碼：

# 命令行執行：  python xml2voc2007.py --input_dir data --output_dir VOCdevkit
import argparse
import glob
import os
import random
import os.path as osp
import sys
import shutil

percent_train = 0.9

# 主程序執行
def main():
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )
    parser.add_argument("--input_dir", default="data", help="input annotated directory")
    parser.add_argument("--output_dir", default="VOCdevkit", help="output dataset directory")
    args = parser.parse_args()

    if osp.exists(args.output_dir):
        print("Output directory already exists:", args.output_dir)
        sys.exit(1)
    os.makedirs(args.output_dir)
    print("| Creating dataset dir:", osp.join(args.output_dir, "VOC2007"))

    # 創建保存的文件夾
    if not os.path.exists(osp.join(args.output_dir, "VOC2007", "Annotations")):
        os.makedirs(osp.join(args.output_dir, "VOC2007", "Annotations"))
    if not os.path.exists(osp.join(args.output_dir, "VOC2007", "ImageSets")):
        os.makedirs(osp.join(args.output_dir, "VOC2007", "ImageSets"))
    if not os.path.exists(osp.join(args.output_dir, "VOC2007", "ImageSets", "Main")):
        os.makedirs(osp.join(args.output_dir, "VOC2007", "ImageSets", "Main"))
    if not os.path.exists(osp.join(args.output_dir, "VOC2007", "JPEGImages")):
        os.makedirs(osp.join(args.output_dir, "VOC2007", "JPEGImages"))

    # 獲取目錄下所有的.jpg文件列表
    total_img = glob.glob(osp.join(args.input_dir, "*.jpg"))
    print('| Image number: ', len(total_img))

    # 獲取目錄下所有的joson文件列表
    total_xml = glob.glob(osp.join(args.input_dir, "*.xml"))
    print('| Xml number: ', len(total_xml))

    num_total = len(total_xml)
    data_list = range(num_total)

    num_tr = int(num_total * percent_train)
    num_train = random.sample(data_list, num_tr)

    print('| Train number: ', num_tr)
    print('| Val number: ', num_total - num_tr)

    file_train = open(
        osp.join(args.output_dir, "VOC2007", "ImageSets", "Main", "train.txt"), 'w')
    file_val = open(
        osp.join(args.output_dir, "VOC2007", "ImageSets", "Main", "val.txt"), 'w')

    for i in data_list:
        name = total_xml[i][:-4] + '\n'
        if i in num_train:
            file_train.write(name[5:])
        else:
            file_val.write(name[5:])

    file_train.close()
    file_val.close()

    if os.path.exists(args.input_dir):
        # root 所指的是當前正在遍歷的這個文件夾的本身的地址
        # dirs 是一個 list，內容是該文件夾中所有的目錄的名字(不包括子目錄)
        # files 同樣是 list, 內容是該文件夾中所有的文件(不包括子目錄)
        for root, dirs, files in os.walk(args.input_dir):
            for file in files:
                src_file = osp.join(root, file)
                if src_file.endswith(".jpg"):
                    shutil.copy(src_file, osp.join(args.output_dir, "VOC2007", "JPEGImages"))
                else:
                    shutil.copy(src_file, osp.join(args.output_dir, "VOC2007", "Annotations"))

    print('| Done!')


if __name__ == "__main__":
    print("—" * 50)
    main()
    print("—" * 50)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PASCAL VOC數據集分析（轉） voc定位數據xml轉coco數據集格式json Pascal VOC數據集標注 mmdetection訓練voc數據集下載數據集（yolov3格式）並轉化為VOC的xml文件 widerface數據庫轉voc2007數據集（python/matlab實現） YOLOv2訓練自己的數據集（VOC格式）目標檢測 – 解析VOC和COCO格式並制作自己的數據集【Detection】物體識別-制作PASCAL VOC數據集 PASCAL VOC數據集The PASCAL Object Recognition Database Collection