YOLOv3 K-means獲取anchors大小

本文轉載自查看原文 2020-01-03 16:43 3845

YOLOv1和YOLOv2簡單看了一下，詳細看了看YOLOv3,剛看的時候是蒙圈的，經過一番研究，分步記錄一下幾個關鍵的點：

v2和v3中加入了anchors和Faster rcnn有一定區別，這個anchors如何理解呢？

個人理解白話篇：

（1）就是有一批標注bbox數據，標注為左上角坐標和右下角坐標，將bbox聚類出幾個類作為事先設置好的anchor的寬高，對應格式就是voc數據集標xml注格式即可。

代碼提取標注數據里的寬高並用圖像的寬高進行歸一化：

def load_dataset(path):
	dataset = []
	for xml_file in glob.glob("{}/*xml".format(path)):
		tree = ET.parse(xml_file)

		height = int(tree.findtext("./size/height"))
		width = int(tree.findtext("./size/width"))

		for obj in tree.iter("object"):
			xmin = int(obj.findtext("bndbox/xmin")) / width
			ymin = int(obj.findtext("bndbox/ymin")) / height
			xmax = int(obj.findtext("bndbox/xmax")) / width
			ymax = int(obj.findtext("bndbox/ymax")) / height

			dataset.append([xmax - xmin, ymax - ymin])

	return np.array(dataset)

（2）具體怎么分的呢？就是用K-means對所有標注的bbox數據根據寬高進行分堆，voc數據被分為9個堆，距離是用的distance = 1-iou

import numpy as np

'''
(1)k-means拿到數據里所有的目標框N個，得到所有的寬和高，在這里面隨機取得9個作為隨機中心
(2)然后其他所有的bbox根據這9個寬高依據iou(作為距離)進行計算，計算出N行9列個distance吧

(3)找到每一行中最小的那個即所有的bbox都被分到了9個當中的一個,然后計算9個族中所有bbox的中位數更新中心點。
(4）直到9個中心不再變即可，這9個中心的x，y就是整個數據的9個合適的anchors==框的寬和高。

'''
def iou(box, clusters):
    """
    Calculates the Intersection over Union (IoU) between a box and k clusters.
    :param box: tuple or array, shifted to the origin (i. e. width and height)
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: numpy array of shape (k, 0) where k is the number of clusters
    """
    #計算每個box與9個clusters的iou
    # boxes ： 所有的[[width, height], [width, height], …… ]
    # clusters : 9個隨機的中心點[width, height]
    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")

    intersection = x * y
    # 所有的boxes的面積
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_


def avg_iou(boxes, clusters):
    """
    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: average IoU as a single float
    """
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])


def translate_boxes(boxes):
    """
    Translates all the boxes to the origin.
    :param boxes: numpy array of shape (r, 4)
    :return: numpy array of shape (r, 2)
    """
    new_boxes = boxes.copy()
    for row in range(new_boxes.shape[0]):
        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
    return np.delete(new_boxes, [0, 1], axis=1)


def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    """
    rows = boxes.shape[0]
    distances = np.empty((rows, k))

    last_clusters = np.zeros((rows,))
    np.random.seed()
    # the Forgy method will fail if the whole array contains the same rows
    #初始化k個聚類中心（從原始數據集中隨機選擇k個）
    clusters = boxes[np.random.choice(rows, k, replace=False)]
    while True:
        for row in range(rows):
            # 定義的距離度量公式：d(box,centroid)=1-IOU(box,centroid)。到聚類中心的距離越小越好，
            # 但IOU值是越大越好，所以使用 1 - IOU，這樣就保證距離越小，IOU值越大。
            # 計算所有的boxes和clusters的值（row，k）
            distances[row] = 1 - iou(boxes[row], clusters)
            #print(distances)
        # 將標注框分配給“距離”最近的聚類中心（也就是這里代碼就是選出（對於每一個box）距離最小的那個聚類中心）。
        nearest_clusters = np.argmin(distances, axis=1)
        # 直到聚類中心改變量為0（也就是聚類中心不變了）。
        if (last_clusters == nearest_clusters).all():
            break
        # 計算每個群的中心（這里把每一個類的中位數作為新的聚類中心）
        for cluster in range(k):
            #這一句是把所有的boxes分到k堆數據中,比較別扭，就是分好了k堆數據，每堆求它的中位數作為新的點
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
        last_clusters = nearest_clusters
    return clusters

　運行代碼：

import glob
import xml.etree.ElementTree as ET

import numpy as np

from kmeans import kmeans, avg_iou

#ANNOTATIONS_PATH = "Annotations"
CLUSTERS = 9

def load_dataset(path):
	dataset = []
	for xml_file in glob.glob("{}/*xml".format(path)):
		tree = ET.parse(xml_file)

		height = int(tree.findtext("./size/height"))
		width = int(tree.findtext("./size/width"))

		for obj in tree.iter("object"):
			xmin = int(obj.findtext("bndbox/xmin")) / width
			ymin = int(obj.findtext("bndbox/ymin")) / height
			xmax = int(obj.findtext("bndbox/xmax")) / width
			ymax = int(obj.findtext("bndbox/ymax")) / height

			dataset.append([xmax - xmin, ymax - ymin])

	return np.array(dataset)

ANNOTATIONS_PATH ="自己數據路徑"
data = load_dataset(ANNOTATIONS_PATH)
out = kmeans(data, k=CLUSTERS)
print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))
#print("Boxes:\n {}".format(out))
print("Boxes:\n {}-{}".format(out[:, 0]*416, out[:, 1]*416))
ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()
print("Ratios:\n {}".format(sorted(ratios)))

　　自己計算的VOC2007數據集總共9963個標簽數據，跟論文中給到的有些許出入，可能是coco和voc2007的區別吧,

計算如下：

Accuracy:

67.22%

Boxes（自己修改的格式都4舍5入了，ratios有些許對不上）:
[347,327 40,40 76,77 184,277 89,207 162,134 14,27 44,128 23,72]

Ratios:
[0.32, 0.35, 0.43, 0.55, 0.67, 0.99, 1.02, 1.06, 1.21]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 YOLOv3中K-Means聚類出新數據集的Anchor尺寸 k-means算法求解anchors （針對YOLO3） YOLOV5——使用 k-means 聚類 anchorbox 數據 K-means Algorithm 聚類-K-Means K-means 算法 K-Means算法 K-means K-Means ++ 算法 sklearn k-means