1.問題
繼上次訓練掛起的bug后,又遇到了現在評估時AP非常低的bug。具體有多低呢?Pelee論文中提到,用128的batchsize大小在coco數據集上訓練70K次迭代后,AP@0.5:0.95為22.4,而我用32的batchsize反復微調之后,最后AP也只從2.9上升到了3.7...下圖為訓練的過程:

2.解決
其實看loss和accuracy還是可以的,但是ap就是上不去,粗略想到了4個地方可能存在的問題:
- 訓練有誤
因為嘗試了各種學習率,而且在各個學習率下都是訓練到ap和loss都不變之后才改變學習率,因此我想不出訓練還有什么其他花樣了...排除 - 數據有誤
因為coco的數據是自己轉換的,說實話對這塊還是有點不放心的。況且之前訓練的bug已經檢查過數據了,看樣子是沒什么問題。現在再檢查工作量也比較大,存疑,先放着 - 模型有誤
模型在voc數據集上能成功復現,排除 - 計算有誤
因為coco評估的腳本是在voc上改的,生成json文件之后再使用官方的cocoapi計算。所以有很大可能是生成json文件的腳本哪里寫錯了
綜上,先排查評估是否有計算錯誤。
但是沒看出來...於是,我尋思着在原來的voc_eval.py文件上做一些修改,以適配coco,再用voc的方式評估。雖然計算方式有差別,但不會差太遠。如果ap變化不大,那說明計算方式沒有問題,需要檢查數據(最怕的就是這種情況,因為工作量很大,而且數據也容易出錯,還好gt標錯的坑已經提前踩了,想想就可怕);反之就是評估計算有問題。
# voc_eval.py 程序結構
def do_python_eval(dataset_path, pred_path, use_07=True):
aps = []
#對每個類別:執行
rec, prec, ap = voc_eval(filename, # 每個類別的預測文件result_x.txt
os.path.join(dataset_path, anno_files), # 每張圖片的標注文件
os.path.join(dataset_path, all_images_file), # 所有圖片的文件名文件
cls_name, # 類別名
cache_path, # 用於暫存所有圖片的標注
ovthresh=0.5,
use_07_metric=use_07_metric)
aps += [ap]
#將rec, prec和ap 存到對應類別的 xx_pr.pkl文件
#打印AP和mAP
def voc_eval(detpath,
annopath,
imagesetfile,
classname,
cachedir,
ovthresh=0.5,
use_07_metric=True):
#1.從imagesetfile中讀取所有圖片文件名
#2.如果cachedir中的annots.pkl文件不存在,則將標注按文件名打包之后寫入annots.pkl;否則加載文件到recs字典中(key為文件名)
#3.提取該類別的標注,按文件名打包存到class_recs字典中
#4.從txt文件讀取該類別的標注
#5.計算rec, prec, 用voc_ap計算ap
#6.返回rec, prec, ap
#如果要更改以適配COCO,需要:
#獲取所有image_id
#從minival.json中按image_id提取標注
分析了一波之后改好了代碼,也能正常運行(非常慢)。要命的是好不容易把文件都讀完了,結果報錯了。顯示keyerror:
{213035: [{'name': 'scissors', 'bbox': [314.25, 168.05, 79.57, 53.75]}, {'name': 'scissors', 'bbox': [238.06, 170.62, 89.75, 64.11]}, {'name': 'person', 'bbox': [0.0, 110.12, 311.59, 235.62]}, {'name': 'person', 'bbox': [195.75, 2.88, 260.04, 109.39]}, {'name': 'bowl', 'bbox': [177.04, 0.0, 66.69, 135.89]}, {'name': 'person', 'bbox': [305.0, 53.24, 330.51, 373.76]}]}
r = [obj for obj in dic[213035] if obj['name'] == 'person']
# 大致是說obj沒有name屬性
輸出改obj的iamge_id之后到minival.json去找,發現這個標注的類別是82還是83...coco不是80個類嗎?給我整懵逼了...
但是,突然靈光一閃,我似乎已經找到問題所在了。coco數據集雖然有80個類,但是卻不是順序排下來的,中間有跳過的序號,所以真實的序號是從1到90,這個項目之前做過一個轉換:
labelmap = {
"none_of_the_above": 0,
"1": 1,
"2": 2,
"3": 3,
"4": 4,
"5": 5,
"6": 6,
"7": 7,
"8": 8,
"9": 9,
"10": 10,
"11": 11,
"13": 12,
"14": 13,
"15": 14,
"16": 15,
"17": 16,
"18": 17,
"19": 18,
"20": 19,
"21": 20,
"22": 21,
"23": 22,
"24": 23,
"25": 24,
"27": 25,
"28": 26,
"31": 27,
"32": 28,
"33": 29,
"34": 30,
"35": 31,
"36": 32,
"37": 33,
"38": 34,
"39": 35,
"40": 36,
"41": 37,
"42": 38,
"43": 39,
"44": 40,
"46": 41,
"47": 42,
"48": 43,
"49": 44,
"50": 45,
"51": 46,
"52": 47,
"53": 48,
"54": 49,
"55": 50,
"56": 51,
"57": 52,
"58": 53,
"59": 54,
"60": 55,
"61": 56,
"62": 57,
"63": 58,
"64": 59,
"65": 60,
"67": 61,
"70": 62,
"72": 63,
"73": 64,
"74": 65,
"75": 66,
"76": 67,
"77": 68,
"78": 69,
"79": 70,
"80": 71,
"81": 72,
"82": 73,
"84": 74,
"85": 75,
"86": 76,
"87": 77,
"88": 78,
"89": 79,
"90": 80
}
COCO_LABELS = {
"bench": (14, 'outdoor'),
"skateboard": (37, 'sports'),
"toothbrush": (80, 'indoor'),
"person": (1, 'person'),
"donut": (55, 'food'),
"none": (0, 'background'),
"refrigerator": (73, 'appliance'),
"horse": (18, 'animal'),
"elephant": (21, 'animal'),
"book": (74, 'indoor'),
"car": (3, 'vehicle'),
"keyboard": (67, 'electronic'),
"cow": (20, 'animal'),
"microwave": (69, 'appliance'),
"traffic light": (10, 'outdoor'),
"tie": (28, 'accessory'),
"dining table": (61, 'furniture'),
"toaster": (71, 'appliance'),
"baseball glove": (36, 'sports'),
"giraffe": (24, 'animal'),
"cake": (56, 'food'),
"handbag": (27, 'accessory'),
"scissors": (77, 'indoor'),
"bowl": (46, 'kitchen'),
"couch": (58, 'furniture'),
"chair": (57, 'furniture'),
"boat": (9, 'vehicle'),
"hair drier": (79, 'indoor'),
"airplane": (5, 'vehicle'),
"pizza": (54, 'food'),
"backpack": (25, 'accessory'),
"kite": (34, 'sports'),
"sheep": (19, 'animal'),
"umbrella": (26, 'accessory'),
"stop sign": (12, 'outdoor'),
"truck": (8, 'vehicle'),
"skis": (31, 'sports'),
"sandwich": (49, 'food'),
"broccoli": (51, 'food'),
"wine glass": (41, 'kitchen'),
"surfboard": (38, 'sports'),
"sports ball": (33, 'sports'),
"cell phone": (68, 'electronic'),
"dog": (17, 'animal'),
"bed": (60, 'furniture'),
"toilet": (62, 'furniture'),
"fire hydrant": (11, 'outdoor'),
"oven": (70, 'appliance'),
"zebra": (23, 'animal'),
"tv": (63, 'electronic'),
"potted plant": (59, 'furniture'),
"parking meter": (13, 'outdoor'),
"spoon": (45, 'kitchen'),
"bus": (6, 'vehicle'),
"laptop": (64, 'electronic'),
"cup": (42, 'kitchen'),
"bird": (15, 'animal'),
"sink": (72, 'appliance'),
"remote": (66, 'electronic'),
"bicycle": (2, 'vehicle'),
"tennis racket": (39, 'sports'),
"baseball bat": (35, 'sports'),
"cat": (16, 'animal'),
"fork": (43, 'kitchen'),
"suitcase": (29, 'accessory'),
"snowboard": (32, 'sports'),
"clock": (75, 'indoor'),
"apple": (48, 'food'),
"mouse": (65, 'electronic'),
"bottle": (40, 'kitchen'),
"frisbee": (30, 'sports'),
"carrot": (52, 'food'),
"bear": (22, 'animal'),
"hot dog": (53, 'food'),
"teddy bear": (78, 'indoor'),
"knife": (44, 'kitchen'),
"train": (7, 'vehicle'),
"vase": (76, 'indoor'),
"banana": (47, 'food'),
"motorcycle": (4, 'vehicle'),
"orange": (50, 'food')
}
媽的生成json文件的時候我忘了換回來了(其實想一下好像不這樣來回轉也行)...所以只有序號從1到11的類別能夠對上。簡單修改后,得到真正的預測結果:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.199
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.343
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.201
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.030
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.200
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.365
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.201
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.295
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.314
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.054
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.548
與論文中的22.4和38.3相比還有差距,但是至少沒有開始嚇人了,再根據正確的AP微調一下應該還能提高點。
算是誤打誤撞解決了?
雖然這次的bug也調了一星期,但是明顯沒有上次那么慌了。