項目文件結構
因為目錄太多又太雜,而且數據格式對路徑有要求,先把文件目錄放出來。(博主目錄結構並不規范)
1、根目錄下的models為克隆下來的項目。2、pedestrian_data目錄下的路徑以及文件夾名稱必須相同,是VOC2012數據格式要求。3、project中data是視頻數據,images是測試的圖片數據、test_images為測試圖片數據集,pedestrian_train文件夾為訓練目錄。
使用公開數據制作訓練數據集
數據下載
視頻數據下載:http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009 bbenfold_headpose/Datasets/TownCentreXVID.avi
標注數據下載:http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009 bbenfold_headpose/Datasets/TownCentre-groundtruth.top
官網(http://www.robots.ox.ac.uk/ActiveVision/index.html)
使用OpenCV生成圖片數據
import cv2 as cv import os def video2ims(src, train_path="images", test_path="test_images", factor=2): os.mkdir(train_path) os.mkdir(test_path) frame = 0 cap = cv.VideoCapture(src) counts = int(cap.get(cv.CAP_PROP_FRAME_COUNT)) w = int(cap.get(cv.CAP_PROP_FRAME_WIDTH)) h = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT)) print("number of frames : %d"%counts) while True: ret, im = cap.read() if ret is True: if frame < 3600: # 前3600幀作為訓練數據 path = train_path else: path = test_path im = cv.resize(im, (w//factor, h//factor)) # 壓縮分辨率 cv.imwrite(os.path.join(path, str(frame)+".jpg"), im) frame += 1 else: break cap.release() if __name__ == "__main__": video2ims("Your TownCentreXVID.avi path")
把生成的圖片拷貝到./pedestrian_data/OC2012/JPEGImages 目錄下,圖片大小為960*540。
制作Pasacal VOC2012數據集格式
利用標注信息文件TownCentre-groundtruth.top生成對應xml文件(注意文件路徑)
import os import pandas as pd if __name__ == '__main__': GT = pd.read_csv('D:/Study/dl/Pedestrian_Detection/TownCentre-groundtruth.top', header=None) indent = lambda x, y: ''.join([' ' for _ in range(y)]) + x factor = 2 train_size = 3600 os.mkdir('xmls') name = 'pedestrian' width, height = 1920 // factor, 1080 // factor for frame_number in range(train_size): Frame = GT.loc[GT[1] == frame_number] x1 = list(Frame[8]) y1 = list(Frame[11]) x2 = list(Frame[10]) y2 = list(Frame[9]) points = [[(round(x1_), round(y1_)), (round(x2_), round(y2_))] for x1_, y1_, x2_, y2_ in zip(x1, y1, x2, y2)] with open(os.path.join('xmls', str(frame_number) + '.xml'), 'w') as file: file.write('<annotation>\n') file.write(indent('<folder>voc2012</folder>\n', 1)) file.write(indent('<filename>' + str(frame_number) + '.jpg' + '</filename>\n', 1)) file.write(indent('<path>D:/Study/dl/Pedestrian_Detection/pedestrian_data/VOC2012/JPEGImages/' + str(frame_number) + '.jpg' + '</path>\n', 1)) file.write(indent('<size>\n', 1)) file.write(indent('<width>' + str(width) + '</width>\n', 2)) file.write(indent('<height>' + str(height) + '</height>\n', 2)) file.write(indent('<depth>3</depth>\n', 2)) file.write(indent('</size>\n', 1)) for point in points: top_left = point[0] bottom_right = point[1] if top_left[0] > bottom_right[0]: xmax, xmin = top_left[0] // factor, bottom_right[0] // factor else: xmin, xmax = top_left[0] // factor, bottom_right[0] // factor if top_left[1] > bottom_right[1]: ymax, ymin = top_left[1] // factor, bottom_right[1] // factor else: ymin, ymax = top_left[1] // factor, bottom_right[1] // factor file.write(indent('<object>\n', 1)) file.write(indent('<name>' + name + '</name>\n', 2)) file.write(indent('<pose>Unspecified</pose>\n', 2)) file.write(indent('<truncated>' + str(0) + '</truncated>\n', 2)) file.write(indent('<difficult>' + str(0) + '</difficult>\n', 2)) file.write(indent('<bndbox>\n', 2)) file.write(indent('<xmin>' + str(xmin) + '</xmin>\n', 3)) file.write(indent('<ymin>' + str(ymin) + '</ymin>\n', 3)) file.write(indent('<xmax>' + str(xmax) + '</xmax>\n', 3)) file.write(indent('<ymax>' + str(ymax) + '</ymax>\n', 3)) file.write(indent('</bndbox>\n', 2)) file.write(indent('</object>\n', 1)) file.write('</annotation>\n') print('File:', frame_number, end='\r')
生成的xml文件全部復制到 ./pedestrian_data/OC2012/Annotations 目錄下。
xml文件中標注的信息有:圖片名稱,目標,大小,通道數,目標所在位置,樣本難易。
生成圖片描述文件
用txt文件對圖片進行描述,生成代碼如下:
import os def generate_classes_text(): print("start to generate classes text...") m_text = open("D:/Study/dl/Pedestrian_Detection/pedestrian_data/trainval.txt", 'w') for i in range(3600): m_text.write(str(i) + " 1 \n") m_text.close() if __name__ == '__main__': generate_classes_text()
生成的文件為:
第一列為圖片編號,對應JPEGImages目錄下的圖片;第二列為1,表示存在行人數據。
這一份txt拷貝成兩份,改名為pedestrian_train.txt和pedestrian_val.txt,放在./pedestrian_data/VOC2012/ImageSets/Main目錄下。如果下一步生成record中報錯,說缺少文件aeroplane_train.txt可以把這兩個文件名中的pedestrian改成aeroplane。(理論上應該不會錯,但我的時候就給我報錯!)
這樣Pasacal VOC2012數據格式就准備完成了,目錄結構是這樣的:
生成 TF record
生成label_map.pbtxt文件,在txt中直接輸入如下內容,並把后綴改成pbtxt即可。然后把label_map.pbtxt放在目錄./Pedestrian_Detection/project/pedestrian_train/data下。
item { id: 1 name: 'pedestrian' }
在命令行窗口下的(dl) D:\Study\dl\Pedestrian_Detection\models\research>執行:
python object_detection/dataset_tools/create_pascal_tf_record.py --label_map_path=D:\Study\dl\Pedestrian_Detection\project\pedestrian_train\data\label_map.pbtxt --data_dir=D:\Study\dl\Pedestrian_Detection\pedestrian_data --year=VOC2012 --set=train --output_path=D:\Study\dl\Pedestrian_Detection\pascal_train.record
和
python object_detection/dataset_tools/create_pascal_tf_record.py --label_map_path=D:\Study\dl\Pedestrian_Detection\project\pedestrian_train\data\label_map.pbtxt --data_dir=D:\Study\dl\Pedestrian_Detection\pedestrian_data --year=VOC2012 --set=val --output_path=D:\Study\dl\Pedestrian_Detection\pascal_val.record
生成兩份record文件。pascal_train.record和pascal_val.record
然后把這兩份文件copy到目錄:/Pedestrian_Detection/project/pedestrian_train/data下。這兩個文件大小都為574M,千萬要檢查,不要把生成失敗的record文件放入目錄中。(這個錯在后面訓練的時候搞了我一下午!!!)
數據集准備完畢,record也准備完畢,下一篇將進行模型訓練。