這學期選了《計算智能》,要做一個有霧環境下的目標檢測的作業。百度了一下沒什么相關的博客,把自己做作業的過程記錄一下。
由於自己沒有可以用的GPU設備,而且Google colab上已經配置好了很多深度學習需要的框架如pytorch、tensorflow等,因此直接在colab上跑模型。關於colab怎么用的教程百度上很多,這里就不多說了。這里主要介紹怎么在colab上用mmdetection跑通這個模型。
數據集使用的是RTTS數據集,數據集是VOC格式的。在mmdetection中只要修改一部分代碼就可以直接使用,下面是在Cola上的操作過程。代碼默認是python代碼,以!或%開頭的代碼是linux命令行。先看看白嫖到什么GPU吧。
!nvidia-smi
輸出如下:
Fri Dec 20 00:57:04 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 34C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
我把數據集存在谷歌硬盤里了,所以這里要掛載一下谷歌硬盤。colab也支持直接上傳文件,不用谷歌硬盤的話也可以直接上傳。
# 掛載Google drive
from google.colab import drive
drive.mount('/content/drive')
從github上把mmdetection克隆下來
!git clone https://github.com/open-mmlab/mmdetection.git
接下來開始安裝mmdetection
%cd /content/mmdetection/
!pip install mmcv
!python setup.py develop
等這個安裝好之后就可以開始用了。不得不說colab真香,在自己機子上配置環境要花不少時間,在colab上安裝一下就能用了。接下來把數據集從谷歌硬盤copy過來,再解壓。
%cd /content/mmdetection/
!mkdir data
%cd data
# 將數據集從谷歌硬盤上復制過來
!cp '/content/drive/My Drive/VOC2007/RTTS_.zip' RTTS_.zip
# 解壓
!unzip RTTS_.zip
數據集准備完畢,接下來需要修改一部分代碼來跑通這個數據集。用的模型是基於resnet101的Faster R-CNN,因此需要修改對應的參數./configs/faster_rcnn_r101_fpn_1x.py。mmdetection默認的數據集是coco,所以首先需要修改數據集的格式以及路徑:
dataset_type = 'VOCDataset'
data_root = '/content/mmdetection/data/'
接着修改數據集中訓練集和交叉驗證集的路徑
data = dict(
imgs_per_gpu=5,
workers_per_gpu=5,
train=dict(
type=dataset_type,
#訓練
ann_file=data_root + 'RTTS/ImageSets/Main/train.txt',
img_prefix=data_root + 'RTTS/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
#交叉驗證
ann_file=data_root + 'RTTS/ImageSets/Main/val.txt',
img_prefix=data_root + 'RTTS/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
#測試
ann_file=data_root + 'RTTS/ImageSets/Main/val.txt',
img_prefix=data_root + 'RTTS/',
pipeline=test_pipeline))
由於在訓練時加入了--validate參數,就把交叉驗證集當作測試集,因此測試集用不到,怎么分配都無所謂。同時colab提供的GPU有16G的顯存,不容易爆顯存,於是將imgs_per_gpu和workers_per_gpu修改為5。這個要看每次colab分配給你的GPU型號,如果顯存太小的話不建議修改這個參數。
然后修改日志顯示間隔為100,50次迭代就顯示一次太頻繁。
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
])
最后修改epoch數、class數和工作路徑:
num_classes=6
total_epochs = 20
work_dir = './work_dirs/faster_rcnn_r101_fpn_1x/hzdtc'
到這里模型的訓練參數就已經修改完畢了。但是我們的數據集與標准的VOC2007還有一些區別,還需要對部分代碼進行修改。
- 修改/mmdetection/mmdet/datasets/voc.py,修改里面的CLASSES和year,不改year會報錯(可能是因為我改了數據集里的文件結構吧,具體還是得看數據集里面的文件結構)。
class VOCDataset(XMLDataset):
CLASSES = ('bicycle', 'bus', 'car', 'motorbike', 'person')
def __init__(self, **kwargs):
super(VOCDataset, self).__init__(**kwargs)
self.year = 2007
# if 'VOC2007' in self.img_prefix:
# self.year = 2007
# elif 'VOC2012' in self.img_prefix:
# self.year = 2012
# else:
# raise ValueError('Cannot infer dataset year from img_prefix')
- 修改/mmdetection/mmdet/core/evaluation/class_names.py
def voc_classes():
return [
'bicycle', 'bus', 'car', 'motorbike', 'person'
]
- 修改/mmdetection/mmdet/datasets/xml_style.py,數據集中的圖片是.png格式的,標准的VOC數據集是.jpg格式的。不改的話無法讀取數據。
def load_annotations(self, ann_file):
img_infos = []
img_ids = mmcv.list_from_file(ann_file)
for img_id in img_ids:
# 修改此處的.jpg為.png
filename = 'JPEGImages/{}.png'.format(img_id)
xml_path = osp.join(self.img_prefix, 'Annotations',
'{}.xml'.format(img_id))
tree = ET.parse(xml_path)
root = tree.getroot()
size = root.find('size')
width = int(size.find('width').text)
height = int(size.find('height').text)
img_infos.append(
dict(id=img_id, filename=filename, width=width, height=height))
return img_infos
完成以上的修改后,就可以開始訓練模型了。雖然只有一張GPU,還是建議使用分布式的訓練方法,因為分布式訓練方法才有--validate參數,可以在每個epoch跑完后看到模型此時的mAP。
%cd /content/mmdetection
!CUDA_VISIBLE_DEVICES=0 ./tools/dist_train.sh configs/faster_rcnn_r101_fpn_1x.py 1 --validate
訓練開始后會先輸出faster_rcnn_r101_fpn_1x.py中的配置,每訓練一個epoch會輸出一次mAP,效果如下:
2019-12-19 05:01:42,257 - INFO - load model from: torchvision://resnet101
2019-12-19 05:01:42,782 - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
2019-12-19 05:01:50,571 - INFO - Start running, host: root@ad882785deec, work_dir: /content/mmdetection/work_dirs/faster_rcnn_r101_fpn_1x/hzdtc
2019-12-19 05:01:50,571 - INFO - workflow: [('train', 1)], max: 20 epochs
2019-12-19 05:04:06,393 - INFO - Epoch [1][100/779] lr: 0.00931, eta: 5:50:24, time: 1.358, data_time: 0.035, memory: 13119, loss_rpn_cls: 0.1735, loss_rpn_bbox: 0.0350, loss_cls: 0.3752, acc: 90.8566, loss_bbox: 0.1861, loss: 0.7698
2019-12-19 05:06:19,575 - INFO - Epoch [1][200/779] lr: 0.01197, eta: 5:44:45, time: 1.332, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0913, loss_rpn_bbox: 0.0369, loss_cls: 0.3376, acc: 89.8496, loss_bbox: 0.2308, loss: 0.6967
2019-12-19 05:08:32,287 - INFO - Epoch [1][300/779] lr: 0.01464, eta: 5:41:00, time: 1.327, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0504, loss_rpn_bbox: 0.0313, loss_cls: 0.3012, acc: 89.9437, loss_bbox: 0.2278, loss: 0.6106
2019-12-19 05:10:44,460 - INFO - Epoch [1][400/779] lr: 0.01731, eta: 5:37:40, time: 1.322, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0474, loss_rpn_bbox: 0.0313, loss_cls: 0.2860, acc: 90.3688, loss_bbox: 0.2042, loss: 0.5689
2019-12-19 05:12:56,712 - INFO - Epoch [1][500/779] lr: 0.01997, eta: 5:34:50, time: 1.323, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0533, loss_rpn_bbox: 0.0311, loss_cls: 0.2851, acc: 90.4473, loss_bbox: 0.1882, loss: 0.5577
2019-12-19 05:15:09,968 - INFO - Epoch [1][600/779] lr: 0.02000, eta: 5:32:38, time: 1.333, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0441, loss_rpn_bbox: 0.0287, loss_cls: 0.2779, acc: 90.5734, loss_bbox: 0.1895, loss: 0.5403
2019-12-19 05:17:22,536 - INFO - Epoch [1][700/779] lr: 0.02000, eta: 5:30:10, time: 1.326, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0372, loss_rpn_bbox: 0.0275, loss_cls: 0.2568, acc: 91.1914, loss_bbox: 0.1738, loss: 0.4953
terminal width is too small (0), please consider widen the terminal for better progressbar visualization
[>>>>>>>>>>] 433/433, 6.8 task/s, elapsed: 64s, ETA: 0s
+-----------+------+-------+--------+-----------+-------+
| class | gts | dets | recall | precision | ap |
+-----------+------+-------+--------+-----------+-------+
| bicycle | 52 | 1109 | 0.673 | 0.032 | 0.222 |
| bus | 175 | 3312 | 0.731 | 0.039 | 0.249 |
| car | 1820 | 12465 | 0.902 | 0.136 | 0.755 |
| motorbike | 101 | 2383 | 0.901 | 0.039 | 0.463 |
| person | 853 | 10286 | 0.884 | 0.075 | 0.617 |
+-----------+------+-------+--------+-----------+-------+
| mAP | | | | | 0.461 |
+-----------+------+-------+--------+-----------+-------+
2019-12-19 05:20:13,279 - INFO - Epoch [1][779/779] lr: 0.02000, mAP: 0.4612
等模型訓練完畢,可以用自帶的日志分析功能對模型的訓練過程進行可視化。本實驗只是看一下模型的mAP和loss的變化,效果如下。
%cd /content/mmdetection
!python tools/analyze_logs.py plot_curve ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json --keys mAP --legend mAP --out mAP.jpg
!python tools/analyze_logs.py plot_curve ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json --keys loss --legend loss --out loss.jpg
輸出如下:
/content/mmdetection
plot curve of ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json, metric is mAP
save curve to: mAP.jpg
plot curve of ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json, metric is loss
save curve to: loss.jpg
colab沒有圖形界面,因此這里圖片顯示不出來。我是通過把圖片輸出為.jpg格式的文件,再用PIL模塊顯示圖片。可能還有更好的方法,但是我不會。
from PIL import Image
mAP = Image.open('mAP.jpg')
mAP
loss = Image.open('loss.jpg')
loss
這里圖片就不放出來了,你們要是自己跑的話是可以看得見的。到這里就結束了。希望對大家有所幫助吧。
