yolov2訓練ICDAR2011數據集

本文轉載自查看原文 2017-03-21 20:00 2543 機器學習/ Machine learning

首先下載數據集train-textloc.zip

其groundtruth文件如下所示：

158,128,412,182,"Footpath"
442,128,501,170,"To"
393,198,488,240,"and"
63,200,363,242,"Colchester"
71,271,383,313,"Greenstead"

ground truth 文件格式為：

將此txt文件轉換成voc xml文件的代碼：

icdar2voc.py

 1 #! /usr/bin/python
 2 #-*-coding:utf8-*-
 3 
 4 import os, sys
 5 import glob
 6 from PIL import Image
 7 
 8 # ICDAR 圖像存儲位置
 9 src_img_dir = "train-textloc"
10 # ICDAR 圖像的 ground truth 的 txt 文件存放位置
11 src_txt_dir = "train-textloc"
12 
13 img_Lists = glob.glob(src_img_dir + '/*.jpg')
14 
15 img_basenames = [] # e.g. 100.jpg
16 for item in img_Lists:
17     img_basenames.append(os.path.basename(item))
18 
19 img_names = [] # e.g. 100
20 for item in img_basenames:
21     temp1, temp2 = os.path.splitext(item)
22     img_names.append(temp1)
23 
24 for img in img_names:
25     im = Image.open((src_img_dir + '/' + img + '.jpg'))
26     width, height = im.size
27 
28     # open the crospronding txt file
29     gt = open(src_txt_dir + '/gt_' + img + '.txt').read().splitlines()
30 
31     # write in xml file
32     #os.mknod(src_txt_dir + '/' + img + '.xml')
33     xml_file = open((src_txt_dir + '/' + img + '.xml'), 'w')
34     xml_file.write('<annotation>\n')
35     xml_file.write('    <folder>VOC2007</folder>\n')
36     xml_file.write('    <filename>' + str(img) + '.jpg' + '</filename>\n')
37     xml_file.write('    <size>\n')
38     xml_file.write('        <width>' + str(width) + '</width>\n')
39     xml_file.write('        <height>' + str(height) + '</height>\n')
40     xml_file.write('        <depth>3</depth>\n')
41     xml_file.write('    </size>\n')
42 
43     # write the region of text on xml file
44     for img_each_label in gt:
45         spt = img_each_label.split(',')
46         xml_file.write('    <object>\n')
47         xml_file.write('        <name>text</name>\n')
48         xml_file.write('        <pose>Unspecified</pose>\n')
49         xml_file.write('        <truncated>0</truncated>\n')
50         xml_file.write('        <difficult>0</difficult>\n')
51         xml_file.write('        <bndbox>\n')
52         xml_file.write('            <xmin>' + str(spt[0]) + '</xmin>\n')
53         xml_file.write('            <ymin>' + str(spt[1]) + '</ymin>\n')
54         xml_file.write('            <xmax>' + str(spt[2]) + '</xmax>\n')
55         xml_file.write('            <ymax>' + str(spt[3]) + '</ymax>\n')
56         xml_file.write('        </bndbox>\n')
57         xml_file.write('    </object>\n')
58 
59     xml_file.write('</annotation>')

View Code

再將xml文件轉換成yolo的txt格式：

voc_label.py

 1 import xml.etree.ElementTree as ET
 2 import pickle
 3 import os
 4 from os import listdir, getcwd
 5 from os.path import join
 6 
 7 
 8 classes = ["text"]
 9 
10 
11 def convert(size, box):
12     dw = 1./size[0]
13     dh = 1./size[1]
14     x = (box[0] + box[1])/2.0
15     y = (box[2] + box[3])/2.0
16     w = box[1] - box[0]
17     h = box[3] - box[2]
18     x = x*dw
19     w = w*dw
20     y = y*dh
21     h = h*dh
22     return (x,y,w,h)
23 
24 for i in range(100,329):
25     in_file = open('train-textloc/%d.xml'% i )
26     out_file = open('train-textloc/%d.txt'% i , 'w')
27     tree=ET.parse(in_file)
28     root = tree.getroot()
29     size = root.find('size')
30     w = int(size.find('width').text)
31     h = int(size.find('height').text)
32 
33     for obj in root.iter('object'):
34         difficult = obj.find('difficult').text
35         cls = obj.find('name').text
36         if cls not in classes or int(difficult) == 1:
37             continue
38         cls_id = classes.index(cls)
39         xmlbox = obj.find('bndbox')
40         b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
41         bb = convert((w,h), b)
42         out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

View Code

下面開始修改yolo的配置：

把20類改成1類

cfg/voc.data文件中:
- classes 改成1。
- names=data/voc.names。
- voc.names里只寫一行 text 即可。
cfg/yolo_voc.cfg文件中 :
- 【region】層中 classes 改成1。
- 【region】層上方第一個【convolution】層，其中的filters值要進行修改，改成(classes+ coords+ 1)* (NUM) ，我的情況中：(1+4+1)* 5=30，我把filters 的值改成了30。
- 修改filters的建議來源自（https://groups.google.com/forum/#!topic/darknet/B4rSpOo84yg），我修改了之后一切正常。
src/yolo.c 文件中 :（經指正，步驟3,4是yolo1的內容，使用yolo_v2的話可以不用更改）
- 位置大約第14行左右改成：char *voc_names={“text”}，原來里面有20類的名字，我改成了唯一1類的名字。
- 位置大約第328行左右，修改draw_detection這個函數最后一個參數：20改成1。這個函數用於把系統檢測出的框給畫出來，並把畫完框的圖片傳回第一個參數im中，用於保存和顯示。
- 位置大約第361行左右，demo函數中，倒數第三個參數我把20改成了1，雖然不知道有沒有用，反正對結果沒什么影響。
src/yolo_kernels.cu 文件中 :（經指正，步驟3,4是yolo1的內容，使用yolo_v2的話可以不用更改）
- 位置第62行，draw_detection這個函數最后一個參數20改成1。
scripts/voc_label.py 文件中（這個應該沒用的） :
- 位置第9行改成：classes=[“text”]，因為我只有一類。

建立一個文件夾，里面JPEGImages里放入所有的圖片，labels里放入所有的標簽，系統會自動識別。

然后生成train.txt

list.py

1 # -*- coding: utf-8 -*-
2 import os
3 fw = open('train.txt','w')
4 files = os.listdir('/home/mingyu_ding/darknet/voc/Table/JPEGImages')
5 for f in files:
6     file = '/home/mingyu_ding/darknet/voc/Table/JPEGImages' + os.sep + f
7     print >> fw, file

View Code

就可以開始訓練了，系統默認會迭代45000次。

nohup ./darknet detector train cfg/voc.data cfg/yolo-voc.cfg darknet19_448.conv.23 > log.txt &

當然迭代次數是可以修改的，應該是在cfg/yolo_voc.cfg修改max_batches的值就行。

沒訓練完就可以測試啦

./darknet detector test cfg/voc.data cfg/yolo-voc.cfg backup/yolo-voc_5000.weights ../Downloads/1.jpg

參考鏈接：http://blog.csdn.net/hysteric314/article/details/54097845

結果如下：

后來又訓練了京東的參數圖數據。

圖是自己下載的，重命名后用labelimg進行標注，之后用voc_label.py修改成標簽數據即可。

rename.py

 1 import os
 2 from os.path import join
 3 
 4 files = os.listdir('imageset')
 5 i = 0
 6 for f in files:
 7     i += 1
 8     print os.path.join(os.getcwd() + os.sep + 'imageset' , f)
 9     print os.getcwd() + os.sep  + 'imageset' + os.sep + '%d.jpg' % i
10     os.rename(os.path.join(os.getcwd() + os.sep + 'imageset' , f), os.getcwd() + os.sep  + 'imageset' + os.sep + '%d.jpg' % i)

View Code

訓練辦法和上面一模一樣'

結果如下圖：

想要輸出預測框的位置的話

修改 image.c 后重新 make 就可以了

1 printf("left:%d right:%d top:%d bot:%d\n",left,right,top,bot);

View Code

修改 detector.c 后 make 可以切割出想要的位置，如下圖所示。

在函數draw_detections() 前面修改就可以

 1         int i;
 2     int j = 1;
 3         for(i = 0; i < l.w*l.h*l.n; ++i){
 4             int class = max_index(probs[i], l.classes);
 5             float prob = probs[i][class];
 6             if(prob > thresh){
 7         box b = boxes[i];
 8         int left  = (b.x-b.w/2.)*im.w;
 9         int right = (b.x+b.w/2.)*im.w;
10         int top   = (b.y-b.h/2.)*im.h;
11         int bot   = (b.y+b.h/2.)*im.h;
12         if(left < 0) left = 0;
13         if(right > im.w-1) right = im.w-1;
14         if(top < 0) top = 0;
15         if(bot > im.h-1) bot = im.h-1;
16         int width = right - left;
17                 int height = bot - top;
18         IplImage* src = cvLoadImage(input,-1);
19         CvSize size = cvSize(width, height);
20                 //printf("%d,%d",src->depth,src->nChannels);
21         IplImage* roi = cvCreateImage(size,src->depth,src->nChannels);
22         CvRect box = cvRect(left, top, size.width, size.height);
23         cvSetImageROI(src,box);
24         cvCopy(src,roi,NULL);
25         //cvNamedWindow("pic",CV_WINDOW_AUTOSIZE);
26         //cvShowImage("pic",src);
27         char name[4] = "cut";
28         char name1[5] = ".jpg";
29         char newname[100];
30         sprintf(newname,"%s%d_%.0f%s",name,j,100*prob,name1);
31         //printf("%s\n",newname);
32         j++;
33         cvSaveImage(newname,roi,0);
34         //cvWaitKey(0); 
35         //cvDestoryWindow("pic");
36         cvReleaseImage(&src);
37         cvReleaseImage(&roi);
38         //printf("left:%d right:%d top:%d bot:%d\n",left,right,top,bot);
39             }
40         };

View Code

下一步就是識別出參數框里的文字了，需要數據集和標簽的可以聯系我。

sudo apt-get install tesseract-ocr

sudo apt-get install tesseract-ocr-chi-sim

tesseract cut1_93.jpg out -l eng+chi_sim

out.txt 就可以看了

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 YOLOv2訓練自己的數據集（VOC格式） YOLOv2訓練自己的數據集（VOC格式） Yolov2訓練自己的數據 YoLov3訓練自己的數據集 yolov3訓練自己的數據集 Yolov3 訓練自己的數據集如何使用yolov3訓練自己的數據集 yolov5 訓練自己的數據集 pytorch版yolov3訓練自己數據集如何在自定義數據集上訓練YOLOv5