在colab上使用yolo v3訓練自己的數據集
本次用到的數據集是天池:零基礎入門CV - 街景字符編碼識別的數據集
其實這個項目中yolo3\yolo4等都有,但是這里就只用yolo3做測試了,yolo3和yolo4的使用方法差不多
關於那個競賽,有位博主已經寫過了如何使用yolo獲得較好的效果:https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12586969.1002.108.2ce879de4cKZcz&postId=118780
但是我這里主要關注於把項目先跑通,最佳實踐之后可以參考
reference:
- https://www.cnblogs.com/monologuesmw/p/13035442.html
- https://blog.csdn.net/weixin_38353277/article/details/105841023
因為是ipunb轉的markdown,所以閱讀起來可能不是很好看,可以下載源文件:https://files.cnblogs.com/files/jiading/yolo_in_colab.zip。注意源文件不包括本文最后的結論部分
#查看colab分配的gpu
!/opt/bin/nvidia-smi
Mon Sep 28 05:14:09 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 36C P8 9W / 70W | 0MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
#下載項目
!git clone https://github.com/AlexeyAB/darknet
Cloning into 'darknet'...
remote: Enumerating objects: 14321, done.[K
remote: Total 14321 (delta 0), reused 0 (delta 0), pack-reused 14321[K
Receiving objects: 100% (14321/14321), 12.87 MiB | 22.64 MiB/s, done.
Resolving deltas: 100% (9772/9772), done.
# 修改makefile 將OpenCV和GPU設置為可用
%cd darknet
'''
Linux sed 命令是利用腳本來處理文本文件
sed 可依照腳本的指令來處理、編輯文本文件
i :插入, i 的后面可以接字串,而這些字串會在新的一行出現(目前的上一行);
'''
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
/content/darknet
#驗證CUDA版本
!/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
運行demo顯示bbox
此步只為了檢驗環境和編譯成功與否
#下載訓練好的coco數據集權重,保存到darknet文件夾中
#!wget https://pjreddie.com/media/files/yolov3.weights
#可以把權重保存下來,之后直接從google drive拉就可以了,比下載快點
#先保存
#!cp /content/darknet/yolov3.weights '/content/drive/My Drive/cvComp1Realted/yolov3.weights'
#再拉取
!cp '/content/drive/My Drive/cvComp1Realted/yolov3.weights' /content/darknet/yolov3.weights
#編譯項目生成darknet運行程序
!make
#定義imshow 調用opencv顯示圖片
def imShow(path):
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
image = cv2.imread(path)
height, width = image.shape[:2]
resized_image = cv2.resize(image,(3*width, 3*height), interpolation = cv2.INTER_CUBIC)
fig = plt.gcf()
fig.set_size_inches(18, 10)
plt.axis("off")
plt.imshow(cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB))
plt.show()
#運行demo
!./darknet detect cfg/yolov3.cfg yolov3.weights data/person.jpg
imShow('predictions.jpg')
CUDA-version: 10010 (10010), cuDNN: 7.6.5, GPU count: 1
OpenCV version: 3.2.0
0 : compute_capability = 750, cudnn_half = 0, GPU: Tesla T4
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF
5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF
6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF
13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF
38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF
63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
81 conv 255 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 255 0.088 BF
82 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
83 route 79 -> 13 x 13 x 512
84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF
85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256
86 route 85 61 -> 26 x 26 x 768
87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF
88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
93 conv 255 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 255 0.177 BF
94 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
95 route 91 -> 26 x 26 x 256
96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128
98 route 97 36 -> 52 x 52 x 384
99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF
100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
105 conv 255 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BF
106 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
Detection layer: 82 - type = 28
Detection layer: 94 - type = 28
Detection layer: 106 - type = 28
data/person.jpg: Predicted in 41.478000 milli-seconds.
dog: 99%
person: 100%
horse: 100%
Unable to init server: Could not connect: Connection refused
(predictions:1181): Gtk-[1;33mWARNING[0m **: [34m05:16:13.455[0m: cannot open display:
將我們cv入門賽的資料解壓到目錄下
#先查看一下當前位置
!pwd
/content/darknet
#把原始數據從google drive拉下來
!mkdir /content/input
!mkdir /content/input/train
!mkdir /content/input/val
!mkdir /content/input/test
#路徑中有空格的話需要用引號引住
!unzip '/content/drive/My Drive/cvComp1Realted/mchar_train.zip' -d /content/input/train
!unzip '/content/drive/My Drive/cvComp1Realted/mchar_val.zip' -d /content/input/val
!unzip '/content/drive/My Drive/cvComp1Realted/mchar_test_a.zip' -d /content/input/test
!cp '/content/drive/My Drive/cvComp1Realted/mchar_train.json' -d /content/input
!cp '/content/drive/My Drive/cvComp1Realted/mchar_val.json' -d /content/input
按照yolo的格式要求構造數據集
Anontations用於存放標簽xml文件
JPEGImage用於存放圖像
ImageSets內的Main文件夾用於存放生成的圖片名字,例如:
#按照voc數據集的格式創建文件夾
!mkdir /content/VOCdevkit/
!mkdir /content/VOCdevkit/VOC2007
!mkdir /content/VOCdevkit/VOC2007/Annotations
!mkdir /content/VOCdevkit/VOC2007/ImageSets
!mkdir /content/VOCdevkit/VOC2007/JPEGImages
!mkdir /content/VOCdevkit/VOC2007/ImageSets/Main
!mkdir /content/VOCdevkit/VOC2007/labels
import glob
#構造圖片名稱保存到ImageSets中
test_path = glob.glob('/content/input/test/mchar_test_a/*.png')
train_path=glob.glob('/content/input/train/mchar_train/*.png')
val_path=glob.glob('/content/input/val/mchar_val/*.png')
#路徑長這樣
train_path[0]
'/content/input/train/mchar_train/000653.png'
#把原始圖片重命名(因為原始圖片在train\val\test中都是從00000.png開始的,所以放在一起會有重名)並且都拷貝到JPEGImages中,且寫入到txt文件中
from shutil import copyfile
import os
path='/content/VOCdevkit/VOC2007/JPEGImages'
#training part
with open('/content/VOCdevkit/VOC2007/ImageSets/Main/train.txt','w') as f:
for item in train_path:
splited=item.split('/')
filename='train_'+splited[5].split('.')[0]
topath=os.path.join(path,filename+'.png')
f.write(os.path.join(path,filename)+'.png\n')
copyfile(item,topath)
#val part
with open('/content/VOCdevkit/VOC2007/ImageSets/Main/val.txt','w') as f:
for item in val_path:
splited=item.split('/')
filename='val_'+splited[5].split('.')[0]
topath=os.path.join(path,filename+'.png')
f.write(os.path.join(path,filename)+'.png\n')
copyfile(item,topath)
#test part
with open('/content/VOCdevkit/VOC2007/ImageSets/Main/test.txt','w') as f:
for item in test_path:
splited=item.split('/')
filename='test_'+splited[5].split('.')[0]
topath=os.path.join(path,filename+'.png')
f.write(os.path.join(path,filename)+'.png\n')
copyfile(item,topath)
構造xml格式的label文件
xml文件格式要求如下圖:
#讀取我們json格式的label文件
import json
train_labels=json.load(open('/content/input/mchar_train.json'))
val_labels=json.load(open('/content/input/mchar_val.json'))
#讀一個label看看
train_labels['000000.png']
{'height': [219, 219],
'label': [1, 9],
'left': [246, 323],
'top': [77, 81],
'width': [81, 96]}
#key為不加前綴的文件名,type_name為train/test/val
#pic_path填粘貼后的地址即可
#這里看着代碼多,其實是用純代碼的方式寫了一個xml文件,要簡化的話其實可以弄一個xml模板然后往里面填內容
def create_xml(key,value,type_name,xml_path,pic_path):
from PIL import Image
import os
import xml.dom.minidom as minidom
filename=type_name+'_'+key.split('.')[0]
with open(os.path.join(xml_path,filename+'.xml'),'w') as f:
dom=minidom.Document()
annotation_node=dom.createElement('annotation')
folder_node=dom.createElement('folder')
name_text_value = dom.createTextNode("VOC2007")
folder_node.appendChild(name_text_value)
annotation_node.appendChild(folder_node)
filename_node=dom.createElement('filename')
name_text_value = dom.createTextNode(filename+'.png')
filename_node.appendChild(name_text_value)
annotation_node.appendChild(filename_node)
source_node=dom.createElement('source')
database_node=dom.createElement('database')
name_text_value = dom.createTextNode("My Database")
database_node.appendChild(name_text_value)
source_node.appendChild(database_node)
annotation_node_2=dom.createElement('annotation')
name_text_value = dom.createTextNode("PASCAL VOC2007")
annotation_node_2.appendChild(name_text_value)
source_node.appendChild(annotation_node_2)
image_node=dom.createElement('image')
name_text_value = dom.createTextNode("flickr")
image_node.appendChild(name_text_value)
source_node.appendChild(image_node)
flickrid_node=dom.createElement('flickrid')
name_text_value = dom.createTextNode("NULL")
flickrid_node.appendChild(name_text_value)
source_node.appendChild(flickrid_node)
annotation_node.appendChild(source_node)
owner_node=dom.createElement('owner')
flickrid_node_2=dom.createElement('flickrid')
name_text_value = dom.createTextNode("NULL")
flickrid_node_2.appendChild(name_text_value)
owner_node.appendChild(flickrid_node_2)
name_node=dom.createElement('name')
name_text_value = dom.createTextNode("company")
name_node.appendChild(name_text_value)
owner_node.appendChild(name_node)
annotation_node.appendChild(owner_node)
size_node=dom.createElement('size')
img = Image.open(os.path.join(pic_path,filename+'.png'))
width_node=dom.createElement('width')
name_text_value = dom.createTextNode(str(img.width))
width_node.appendChild(name_text_value)
height_node=dom.createElement('height')
name_text_value = dom.createTextNode(str(img.height))
height_node.appendChild(name_text_value)
depth_node=dom.createElement('depth')
name_text_value = dom.createTextNode(str(3))
depth_node.appendChild(name_text_value)
size_node.appendChild(width_node)
size_node.appendChild(height_node)
size_node.appendChild(depth_node)
annotation_node.appendChild(size_node)
segmented_node=dom.createElement('segmented')
name_text_value = dom.createTextNode(str(0))
segmented_node.appendChild(name_text_value)
annotation_node.appendChild(segmented_node)
if value is not None:
labels=value['label']
index=0
for label in labels:
object_node=dom.createElement('object')
name_node_2=dom.createElement('name')
name_text_value = dom.createTextNode(str(label))
name_node_2.appendChild(name_text_value)
object_node.appendChild(name_node_2)
pose_node=dom.createElement('pose')
name_text_value = dom.createTextNode('Unspecified')
pose_node.appendChild(name_text_value)
object_node.appendChild(pose_node)
truncated_node=dom.createElement('truncated')
name_text_value = dom.createTextNode(str(0))
truncated_node.appendChild(name_text_value)
object_node.appendChild(truncated_node)
difficult_node=dom.createElement('difficult')
name_text_value = dom.createTextNode(str(0))
difficult_node.appendChild(name_text_value)
object_node.appendChild(difficult_node)
bndbox_node=dom.createElement('bndbox')
xmin_node=dom.createElement('xmin')
name_text_value = dom.createTextNode(str(value['left'][index]))
xmin_node.appendChild(name_text_value)
bndbox_node.appendChild(xmin_node)
ymin_node=dom.createElement('ymin')
name_text_value = dom.createTextNode(str(value['top'][index]))
ymin_node.appendChild(name_text_value)
bndbox_node.appendChild(ymin_node)
xmax_node=dom.createElement('xmax')
name_text_value = dom.createTextNode(str(value['left'][index]+value['width'][index]))
xmax_node.appendChild(name_text_value)
bndbox_node.appendChild(xmax_node)
ymax_node=dom.createElement('ymax')
name_text_value = dom.createTextNode(str(value['top'][index]+value['height'][index]))
ymax_node.appendChild(name_text_value)
bndbox_node.appendChild(ymax_node)
object_node.appendChild(bndbox_node)
annotation_node.appendChild(object_node)
index+=1
dom.appendChild(annotation_node)
dom.writexml(f, addindent='\n', encoding='utf-8')
#創建一個xml試一試
#def create_xml(key,value,type_name,xml_path,pic_path):
xml_path='/content/VOCdevkit/VOC2007/Annotations'
pic_path='/content/VOCdevkit/VOC2007/JPEGImages'
create_xml('000000.png',train_labels['000000.png'],'train',xml_path=xml_path,pic_path=pic_path)
#沒有問題了,就把所有的都轉換掉
for (key,value) in train_labels.items():
try:
create_xml(key,value,'train',xml_path=xml_path,pic_path=pic_path)
except (FileNotFoundError):
print(key)
continue
for (key,value) in val_labels.items():
try:
create_xml(key,value,'val',xml_path=xml_path,pic_path=pic_path)
except (FileNotFoundError):
print(key)
continue
#test的xml做不做都行,我在voc_label.py中把test部分刪了
'''
test_path = glob.glob('/content/input/test/mchar_test_a/*.png')
for i in test_path:
name=i.split('/')
try:
create_xml(name[len(name)-1],None,'test',xml_path=xml_path,pic_path=pic_path)
except (FileNotFoundError):
print(name[len(name)-1])
continue
'''
"\ntest_path = glob.glob('/content/input/test/mchar_test_a/*.png')\nfor i in test_path:\n name=i.split('/')\n try:\n create_xml(name[len(name)-1],None,'test',xml_path=xml_path,pic_path=pic_path)\n except (FileNotFoundError):\n print(name[len(name)-1])\n continue\n"
#這個colab環境里面是有的
!pip install opencv-python
Requirement already satisfied: opencv-python in /usr/local/lib/python3.6/dist-packages (4.1.2.30)
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.6/dist-packages (from opencv-python) (1.18.5)
修改參數:
進入cfg文件夾,修改yolov3.cfg中:
- 文本最開始的batch和subdivision
(colab使用的是16GB顯存的Tesla T4,batch設置為128比較合適) - 文本最后三處[yolo]標簽中的classes和classes前面的一個filters(等於(5+類別數)*3)
!pwd
/content/darknet
#寫train.data文件
with(open('train.data','w')) as f:
f.write('classes=10\ntrain=/content/VOCdevkit/VOC2007/ImageSets/Main/train.txt\nvalid=/content/VOCdevkit/VOC2007/ImageSets/Main/val.txt\nnames=train.names\nbackup=/content/drive/My Drive/cvComp1Realted/backup')
#寫train.names文件
with(open('train.names','w'))as f:
f.write('0\n1\n2\n3\n4\n5\n6\n7\n8\n9')
#!cp /content/darknet/scripts/voc_label.py /content/
這里對voc_label.py文件進行了修改,修改后的文件如下:
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
#sets=[('2012', 'train'), ('2012', 'val'), ('2007', 'train'), ('2007', 'val'), ('2007', 'test')]
sets=[ ('2007', 'train'), ('2007', 'val')]
#classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
classes=['0','1','2','3','4','5','6','7','8','9']
def convert(size, box):
dw = 1./(size[0])
dh = 1./(size[1])
x = (box[0] + box[1])/2.0 - 1
y = (box[2] + box[3])/2.0 - 1
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
def convert_annotation(year, image_id):
image_ids=image_id.split('/')
image_id=image_ids[len(image_ids)-1]
image_id=image_id.split('.')[0]
in_file = open('/content/VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
out_file = open('/content/VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
tree=ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult)==1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
wd = getcwd()
for year, image_set in sets:
if not os.path.exists('/content/VOCdevkit/VOC%s/labels/'%(year)):
os.makedirs('/content/VOCdevkit/VOC%s/labels/'%(year))
image_ids = open('/content/VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
list_file = open('%s_%s.txt'%(year, image_set), 'w')
for image_id in image_ids:
list_file.write(image_id+'\n')
convert_annotation(year, image_id)
list_file.close()
os.system("cat 2007_train.txt 2007_val.txt > train.txt")
os.system("cat 2007_train.txt 2007_val.txt 2007_test.txt > train.all.txt")
#保存修改后的voc_label.py
#!cp /content/voc_label.py '/content/drive/My Drive/cvComp1Realted/voc_label.py'
#加載
!cp '/content/drive/My Drive/cvComp1Realted/voc_label.py' /content/voc_label.py
!python /content/voc_label.py
cat: 2007_test.txt: No such file or directory
#開啟訓練
!./darknet detector train /content/darknet/train.data cfg/yolov3.cfg yolov3.weights -dont_show -map
#!cp /content/darknet/backup/yolov3_last.weights '/content/drive/My Drive/cvComp1Realted/backup/yolov3_last.weights'
#load back
!cp '/content/drive/My Drive/cvComp1Realted/backup/yolov3_last.weights' /content/darknet/backup/yolov3_last.weights
#測試一張
!./darknet detector test /content/darknet/train.data /content/darknet/cfg/yolov3.cfg /content/darknet/backup/yolov3_last.weights /content/input/val/mchar_val/000001.png -i 0 -thresh 0.05
imShow('predictions.jpg')
CUDA-version: 10010 (10010), cuDNN: 7.6.5, GPU count: 1
OpenCV version: 3.2.0
0 : compute_capability = 750, cudnn_half = 0, GPU: Tesla T4
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF
5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF
6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF
13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF
38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF
63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
81 conv 45 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 45 0.016 BF
82 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
83 route 79 -> 13 x 13 x 512
84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF
85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256
86 route 85 61 -> 26 x 26 x 768
87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF
88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
93 conv 45 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 45 0.031 BF
94 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
95 route 91 -> 26 x 26 x 256
96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128
98 route 97 36 -> 52 x 52 x 384
99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF
100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
105 conv 45 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 45 0.062 BF
106 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 65.370
avg_outputs = 518514
Allocate additional workspace_size = 52.43 MB
Loading weights from /content/darknet/backup/yolov3_last.weights...
seen 64, trained: 32038 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
Detection layer: 82 - type = 28
Detection layer: 94 - type = 28
Detection layer: 106 - type = 28
/content/input/val/mchar_val/000001.png: Predicted in 40.634000 milli-seconds.
1: 6%
Unable to init server: Could not connect: Connection refused
(predictions:1358): Gtk-[1;33mWARNING[0m **: [34m05:21:04.194[0m: cannot open display:
結論
其實我在嘗試的時候,最后的預測如果使用默認的置信度(0.25)是出不來框的,我這里在預測時將置信度通過-thresh 0.05
調節到了0.05,當然不能應用(誤差太大),只是為了證明項目的流程沒有問題。但精度不足的可能原因有:
-
訓練不足:實測colab在使用shell命令訓練一段時間后會斷開,這一點我嘗試了三次都是如此,所以保存的是一個訓練不足的版本
此外,默認的訓練好像是只有在完成時才會保存參數,沒有找到設置多少epoch自動保存的設置,這個有待研究
-
輸入尺寸有問題:理論上圖片的輸入應該是什么尺寸都可以的,但是根據這篇博文的討論,yolo的效果在輸入圖片和它配置文件(指cfg/yolo3.cfg)中的尺寸一致時候效果最好,而我這里用的數據集各個圖片的尺寸都不一樣,和配置文件中的尺寸更就不一樣了,我猜測這也是在訓練時只有小尺寸的檢測框能檢測到物體,大尺寸的框就不行的原因(訓練的時候每次會輸出三個值,就是三個yolo輸出,采集的不同尺度的結果)。因為目標檢測的resize還涉及到label中定位框的坐標變化,比較復雜,我這里就沒做,當然要做的話其實也不難,圖片和label中的定位框尺寸同比例縮放就好了。
另外有一點要注意,根據其他博主的文章 (例如https://blog.csdn.net/qq_44166805/article/details/105876028),我們自己在數據集的ImageSet/Main文件夾下的txt文件中是只需要寫不含后綴的文件名的,而在模型的如下的配置文件中我們需要寫兩個txt的路徑,這兩個文件是由voc_label.py隨着各個圖片的txt一起生成的,里面是各個圖片的全路徑。但我因為是第一次用,不熟悉,自己在Main文件夾下生成的txt其實就是全路徑的,而我手動修改了voc_label.py的代碼,讓它輸出了一個和我的txt完全相同的文件,當然這兩種在使用上沒有什么區別,都是可以用的
classes= 2 #classes為訓練樣本集的類別總數
train = scripts/train.txt #train的路徑為訓練樣本集所在的路徑,前面生成的
valid = scripts/test.txt #valid的路徑為驗證樣本集所在的路徑,前面生成的
names = data/safe.names #names的路徑為***.names文件所在的路徑
backup = backup/
不管精度怎樣,模型起碼是通了