YOLOv1和YOLOv2簡單看了一下,詳細看了看YOLOv3,剛看的時候是蒙圈的,經過一番研究,分步記錄一下幾個關鍵的點:
v2和v3中加入了anchors和Faster rcnn有一定區別,這個anchors如何理解呢?
個人理解白話篇:
(1)就是有一批標注bbox數據,標注為左上角坐標和右下角坐標,將bbox聚類出幾個類作為事先設置好的anchor的寬高,對應格式就是voc數據集標xml注格式即可。
代碼提取標注數據里的寬高並用圖像的寬高進行歸一化:
def load_dataset(path): dataset = [] for xml_file in glob.glob("{}/*xml".format(path)): tree = ET.parse(xml_file) height = int(tree.findtext("./size/height")) width = int(tree.findtext("./size/width")) for obj in tree.iter("object"): xmin = int(obj.findtext("bndbox/xmin")) / width ymin = int(obj.findtext("bndbox/ymin")) / height xmax = int(obj.findtext("bndbox/xmax")) / width ymax = int(obj.findtext("bndbox/ymax")) / height dataset.append([xmax - xmin, ymax - ymin]) return np.array(dataset)
(2)具體怎么分的呢?就是用K-means對所有標注的bbox數據根據寬高進行分堆,voc數據被分為9個堆,距離是用的distance = 1-iou
import numpy as np ''' (1)k-means拿到數據里所有的目標框N個,得到所有的寬和高,在這里面隨機取得9個作為隨機中心
(2)然后其他所有的bbox根據這9個寬高依據iou(作為距離)進行計算,計算出N行9列個distance吧
(3)找到每一行中最小的那個即所有的bbox都被分到了9個當中的一個,然后計算9個族中所有bbox的中位數更新中心點。
(4)直到9個中心不再變即可,這9個中心的x,y就是整個數據的9個合適的anchors==框的寬和高。
''' def iou(box, clusters): """ Calculates the Intersection over Union (IoU) between a box and k clusters. :param box: tuple or array, shifted to the origin (i. e. width and height) :param clusters: numpy array of shape (k, 2) where k is the number of clusters :return: numpy array of shape (k, 0) where k is the number of clusters """ #計算每個box與9個clusters的iou # boxes : 所有的[[width, height], [width, height], …… ] # clusters : 9個隨機的中心點[width, height] x = np.minimum(clusters[:, 0], box[0]) y = np.minimum(clusters[:, 1], box[1]) if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0: raise ValueError("Box has no area") intersection = x * y # 所有的boxes的面積 box_area = box[0] * box[1] cluster_area = clusters[:, 0] * clusters[:, 1] iou_ = intersection / (box_area + cluster_area - intersection) return iou_ def avg_iou(boxes, clusters): """ Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters. :param boxes: numpy array of shape (r, 2), where r is the number of rows :param clusters: numpy array of shape (k, 2) where k is the number of clusters :return: average IoU as a single float """ return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])]) def translate_boxes(boxes): """ Translates all the boxes to the origin. :param boxes: numpy array of shape (r, 4) :return: numpy array of shape (r, 2) """ new_boxes = boxes.copy() for row in range(new_boxes.shape[0]): new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0]) new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1]) return np.delete(new_boxes, [0, 1], axis=1) def kmeans(boxes, k, dist=np.median): """ Calculates k-means clustering with the Intersection over Union (IoU) metric. :param boxes: numpy array of shape (r, 2), where r is the number of rows :param k: number of clusters :param dist: distance function :return: numpy array of shape (k, 2) """ rows = boxes.shape[0] distances = np.empty((rows, k)) last_clusters = np.zeros((rows,)) np.random.seed() # the Forgy method will fail if the whole array contains the same rows #初始化k個聚類中心(從原始數據集中隨機選擇k個) clusters = boxes[np.random.choice(rows, k, replace=False)] while True: for row in range(rows): # 定義的距離度量公式:d(box,centroid)=1-IOU(box,centroid)。到聚類中心的距離越小越好, # 但IOU值是越大越好,所以使用 1 - IOU,這樣就保證距離越小,IOU值越大。 # 計算所有的boxes和clusters的值(row,k) distances[row] = 1 - iou(boxes[row], clusters) #print(distances) # 將標注框分配給“距離”最近的聚類中心(也就是這里代碼就是選出(對於每一個box)距離最小的那個聚類中心)。 nearest_clusters = np.argmin(distances, axis=1) # 直到聚類中心改變量為0(也就是聚類中心不變了)。 if (last_clusters == nearest_clusters).all(): break # 計算每個群的中心(這里把每一個類的中位數作為新的聚類中心) for cluster in range(k): #這一句是把所有的boxes分到k堆數據中,比較別扭,就是分好了k堆數據,每堆求它的中位數作為新的點 clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0) last_clusters = nearest_clusters return clusters
運行代碼:
import glob import xml.etree.ElementTree as ET import numpy as np from kmeans import kmeans, avg_iou #ANNOTATIONS_PATH = "Annotations" CLUSTERS = 9 def load_dataset(path): dataset = [] for xml_file in glob.glob("{}/*xml".format(path)): tree = ET.parse(xml_file) height = int(tree.findtext("./size/height")) width = int(tree.findtext("./size/width")) for obj in tree.iter("object"): xmin = int(obj.findtext("bndbox/xmin")) / width ymin = int(obj.findtext("bndbox/ymin")) / height xmax = int(obj.findtext("bndbox/xmax")) / width ymax = int(obj.findtext("bndbox/ymax")) / height dataset.append([xmax - xmin, ymax - ymin]) return np.array(dataset) ANNOTATIONS_PATH ="自己數據路徑" data = load_dataset(ANNOTATIONS_PATH) out = kmeans(data, k=CLUSTERS) print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100)) #print("Boxes:\n {}".format(out)) print("Boxes:\n {}-{}".format(out[:, 0]*416, out[:, 1]*416)) ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist() print("Ratios:\n {}".format(sorted(ratios)))
自己計算的VOC2007數據集總共9963個標簽數據,跟論文中給到的有些許出入,可能是coco和voc2007的區別吧,
計算如下:
Accuracy:
67.22%
Boxes(自己修改的格式 都4舍5入了,ratios有些許對不上):
[347,327 40,40 76,77 184,277 89,207 162,134 14,27 44,128 23,72]
Ratios:
[0.32, 0.35, 0.43, 0.55, 0.67, 0.99, 1.02, 1.06, 1.21]