論文:Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
代碼:https://giou.stanford.edu/
https://github.com/Zzh-tju/CIoU
IoU
![]() |
![]() |
Intersection over Union (IoU) 是目標檢測里一種重要的評價值。上面第一張途中框出了 gt box 和 predict box,IoU 通過計算這兩個框 A、B 間的 Intersection Area $I$ 和 Union Area $U$ 的比值來獲得:
\begin{equation}
\label{IoU}
IoU = \frac{|A \cap B|}{|A \cup B|} = \frac{|I|}{|U|}
\end{equation}
然而現有的算法都采用 distance losses(例如 SSD 里的 smooth_L1 loss) 來優化這一評價值。講道理 The optimal objective for a metric is the metric itself. 所以我們可以直接將 IoU 直接作為回歸 loss 來使用,令人遺憾的是 IoU 無法優化無重疊的 bboxes。
如果用 IoU 作為 loss($\mathcal{L}_{IoU} = 1 - IoU$) 衡量值的話有兩個優點和一個缺點:
1. IoU 可以有效比較兩個任意形狀之間相似性
2. IoU 具有尺度不變性
3. 任意兩個形狀 A、B 之間如果沒有 overlap,則 IoU 均為 0,此時,IoU 無法分辨兩個形狀 A、B 是靠的非常近還是非常遠
GIoU
GIoU 作為 IoU 的升級版,既繼承了 IoU 的兩個優點,又彌補了 IoU 無法衡量無重疊框之間的距離的缺點。具體計算方式是在 IoU 計算的基礎上尋找一個 smallest convex shapes $C$,具體計算公式是:
\begin{equation}
\label{GIoU}
GIoU = \frac{|A \cap B|}{|A \cup B|} - \frac{|C \setminus (A \cup B)|}{|C|} = IoU - \frac{|C \setminus (A \cup B)|}{|C|}
\end{equation}
下圖中有兩個不同的檢測結果 bad & better,不難看出距離 gt box 越遠 $C$ 越大。
![]() |
![]() |
![]() |
![]() |
如此,損失函數可以寫成:$\mathcal{L}_{GIoU} = 1- GIoU$,不難發現 $\mathcal{L}_{GIoU}$ 的值域范圍為 $[0, 2)$。
In summary, this generalization keeps the major properties of IoU while rectifying its weakness.
DIoU & CIoU
論文中提出,GIoU loss 仍然存在收斂速度慢、回歸不准等問題。
In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, i.e., overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement.
作者在分析 GIoU loss 時,發現 GIoU 首先會試圖通過增加檢測框的大小使其與目標 bbox 有重疊,然后利用 IoU loss 項使其與目標 bbox 重疊面積最大,如下左圖所示:
![]() |
![]() |
同時,但兩個框有包含關系是,GIoU loss 就退化成了 IoU loss 了。這時候邊界框的對齊變得較困難,收斂較慢。
In Distance-IoU (DIoU) loss, we simply add a penalty term on IoU loss to directly minimize the normalized distance between central points of two bounding boxes, leading to much faster convergence than GIoU loss.
作者認為,一個好的 bbox 回歸損失應該考慮三個重要的集合度量:重疊面積、中心點距離和高寬比。結合這些,作者進一步提出了一個 Complete IoU (CIoU) loss。同時 DIoU 還可以引入到 NMS 中來替換里面的 IoU,使得目標在遮擋情況下檢測更魯棒。
DIoU
參考上圖,DIoU loss 的公式為:
\begin{equation}
\label{DIoU}
\begin{split}
& \mathcal{R}_{DIoU} = \frac{\rho^2(\bf{b}, \bf{b^{gt}})}{c^2} \\
& \mathcal{L}_{DIoU} = 1 - IoU + \frac{\rho^2(\bf{b}, \bf{b^{gt}})}{c^2} \\
& \mathcal{L}_{DIoU} = 1 - IoU + \frac{d^2}{c^2}
\end{split}
\end{equation}
這里的 $\bf{d}$ 和 $\bf{c}$ 分別代表檢測框和真實框的中心點,且 $d$ 代表的是計算兩個中心點之間的歐氏距離,$c$ 則代表 GIoU 中提到的 smallest convex shapes 的對角線距離。
優點:
- 與GIoU loss 類似,DIoU loss 在與目標框不重疊時,仍然可以為邊界框提供移動方向。
- DIoU loss 可以直接最小化兩個目標框的距離,因此比 GIoU loss 收斂快得多。
- 對於包含兩個框在水平方向和垂直方向上這種情況,DIoU loss 可以使回歸非常快,而 GIoU loss 幾乎退化為 IoU loss。
- DIoU 還可以替換普通的 IoU 評價策略,應用於 NMS 中,使得 NMS 得到的結果更加合理和有效。
同 $\mathcal{L}_{GIoU}$ 類似, $\mathcal{L}_{DIoU}$ 的值域范圍也為 $[0, 2)$。
CIoU
$\mathcal{L}_{CIoU}$ 在 $\mathcal{L}_{DIoU}$ 的基礎上考慮了 aspect ratios:
\begin{equation}
\label{CIoU}
\begin{split}
& \mathcal{R}_{CIoU} = \frac{\rho^2(\bf{b}, \bf{b^{gt}})}{c^2} + \alpha v \\
& v = \frac{4}{{\pi}^2}(arctan \frac{w^{gt}}{h^{gt}} - arctan \frac{w}{h})^2 \\
& \alpha = \frac{v}{(1 - IoU) + v} \\
& \mathcal{L}_{CoU} = 1 - IoU + \frac{d^2}{c^2} + \alpha v
\end{split}
\end{equation}
額,這個。。。看起來復雜的一逼
其中,$v$ 用來衡量高寬比的一致性,$\alpha$ 是一個 positive trade-off parameter, 是不參與求導的。
DIoU-NMS
這個還沒試,等着。。。
示例

import numpy as np import matplotlib.pyplot as plt import math epsilon = 1e-5 def IoU(box1, box2, wh=False): if wh: xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 # 計算交集部分尺寸 W = min(xmax1, xmax2) - max(xmin1, xmin2) H = min(ymax1, ymax2) - max(ymin1, ymin2) # 計算兩個矩形框面積 SA = (xmax1 - xmin1) * (ymax1 - ymin1) SB = (xmax2 - xmin2) * (ymax2 - ymin2) cross = max(0, W) * max(0, H) # 計算交集面積 iou = float(cross) / (SA + SB - cross) return iou def GIoU(box1, box2, wh=False): if wh: xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 iou = IoU(box1, box2, wh) SC = (max(xmax1, xmax2) - min(xmin1, xmin2)) * (max(ymax1, ymax2) - min(ymin1, ymin2)) # 計算交集部分尺寸 W = min(xmax1, xmax2) - max(xmin1, xmin2) H = min(ymax1, ymax2) - max(ymin1, ymin2) # 計算兩個矩形框面積 SA = (xmax1 - xmin1) * (ymax1 - ymin1) SB = (xmax2 - xmin2) * (ymax2 - ymin2) cross = max(0, W) * max(0, H) # 計算交集面積 add_area = SA + SB - cross # 兩矩形並集的面積 end_area = (SC - add_area) / SC # 閉包區域中不屬於兩個框的區域占閉包區域的比重 giou = iou - end_area return giou def DIoU(box1, box2, wh=False): if wh: inter_diag = (box1[0] - box2[0])**2 + (box1[1] - box2[1])**2 xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 center_x1 = (xmax1 + xmin1) / 2 center_y1 = (ymax1 + ymin1) / 2 center_x2 = (xmax2 + xmin2) / 2 center_y2 = (ymax2 + ymin2) / 2 inter_diag = (center_x1 - center_x2)/2 ** 2 + (center_y1 - center_y2) ** 2 iou = IoU(box1, box2, wh) enclose1 = max(max(xmax1, xmax2)-min(xmin1, xmin2), 0.0) enclose2 = max(max(ymax1, ymax2)-min(ymin1, ymin2), 0.0) outer_diag = (enclose1 ** 2) + (enclose2 ** 2) diou = iou - 1.0 * inter_diag / outer_diag return diou def CIoU(box1, box2, wh=False, normaled=False): if wh: w1, h1 = box1[2], box1[3] w2, h2 = box2[2], box2[3] inter_diag = (box1[0] - box2[0])**2 + (box1[1] - box2[1])**2 xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 w1, h1 = xmax1-xmin1, ymax1-ymin1 w2, h2 = xmax2-xmin2, ymax2-ymin2 center_x1 = (xmax1 + xmin1) / 2 center_y1 = (ymax1 + ymin1) / 2 center_x2 = (xmax2 + xmin2) / 2 center_y2 = (ymax2 + ymin2) / 2 inter_diag = (center_x1 - center_x2)/2 ** 2 + (center_y1 - center_y2) ** 2 iou = IoU(box1, box2, wh) enclose1 = max(max(xmax1, xmax2)-min(xmin1, xmin2), 0.0) enclose2 = max(max(ymax1, ymax2)-min(ymin1, ymin2), 0.0) outer_diag = (enclose1 ** 2) + (enclose2 ** 2) u = (inter_diag) / outer_diag arctan = math.atan(w2 / h2) - math.atan(w1 / h1) v = (4 / (math.pi ** 2)) * (math.atan(w2 / h2) - math.atan(w1 / h1))**2 S = 1 - iou alpha = v / (S + v) w_temp = 2 * w1 distance = w1 ** 2 + h1 ** 2 ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = np.clip(cious, a_min=-1.0, a_max=1.0) return cious def bbox_giou_np(boxes1, boxes2): # xywh -> xyxy boxes1 = np.concatenate([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = np.concatenate([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = np.concatenate([np.minimum(boxes1[..., :2], boxes1[..., 2:]), np.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = np.concatenate([np.minimum(boxes2[..., :2], boxes2[..., 2:]), np.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = np.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = np.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 計算兩個邊界框之間的 iou 值 iou = inter_area / union_area # 計算最小閉合凸面 C 左上角和右下角的坐標 enclose_left_up = np.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = np.maximum(enclose_right_down - enclose_left_up, 0.0) # 計算最小閉合凸面 C 的面積 enclose_area = enclose[..., 0] * enclose[..., 1] # 根據 GIoU 公式計算 GIoU 值 giou = iou - 1.0 * (enclose_area - union_area) / enclose_area return giou # https://github.com/YunYang1994/TensorFlow2.0-Examples/blob/4d4a403d00e6e887ecb7229719b1407d2e132811/4-Object_Detection/YOLOV3/core/yolov3.py#L121 def bbox_giou_tf(boxes1, boxes2): # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]), tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]), tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = tf.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 計算兩個邊界框之間的 iou 值 iou = inter_area / union_area # 計算最小閉合凸面 C 左上角和右下角的坐標 enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0) # 計算最小閉合凸面 C 的面積 enclose_area = enclose[..., 0] * enclose[..., 1] # 根據 GIoU 公式計算 GIoU 值 giou = iou - 1.0 * (enclose_area - union_area) / enclose_area return giou def bbox_giou_torch(boxes1, boxes2): # boxes1, boxes2 = torch.tensor(boxes1, dtype=torch.float32), torch.tensor(boxes2, dtype=torch.float32) boxes1, boxes2 = torch.from_numpy(boxes1).float(), torch.from_numpy(boxes2).float() # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = torch.cat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], dim=-1) boxes2 = torch.cat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], dim=-1) boxes1 = torch.cat([torch.min(boxes1[..., :2], boxes1[..., 2:]), torch.max(boxes1[..., :2], boxes1[..., 2:])], dim=-1) boxes2 = torch.cat([torch.min(boxes2[..., :2], boxes2[..., 2:]), torch.max(boxes2[..., :2], boxes2[..., 2:])], dim=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = torch.max(boxes1[..., :2], boxes2[..., :2]) right_down = torch.min(boxes1[..., 2:], boxes2[..., 2:]) inter_section = torch.max(right_down - left_up, torch.tensor(0.0)) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 計算兩個邊界框之間的 iou 值 iou = inter_area / union_area # 計算最小閉合凸面 C 左上角和右下角的坐標 enclose_left_up = torch.min(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = torch.max(boxes1[..., 2:], boxes2[..., 2:]) enclose = torch.max(enclose_right_down - enclose_left_up, torch.tensor(0.0)) # 計算最小閉合凸面 C 的面積 enclose_area = enclose[..., 0] * enclose[..., 1] # 根據 GIoU 公式計算 GIoU 值 giou = iou - 1.0 * (enclose_area - union_area) / enclose_area return giou # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/65b68b53f73173397937d4950ff916a41545c960/utils/box/box_utils.py#L5 def bbox_diou_torch(bboxes1, bboxes2): bboxes1, bboxes2 = torch.from_numpy(bboxes1).float(), torch.from_numpy(bboxes2).float() rows = bboxes1.shape[0] cols = bboxes2.shape[0] dious = torch.zeros((rows, cols)) if rows * cols == 0: return dious exchange = False if bboxes1.shape[0] > bboxes2.shape[0]: bboxes1, bboxes2 = bboxes2, bboxes1 dious = torch.zeros((cols, rows)) exchange = True w1 = bboxes1[:, 2] - bboxes1[:, 0] h1 = bboxes1[:, 3] - bboxes1[:, 1] w2 = bboxes2[:, 2] - bboxes2[:, 0] h2 = bboxes2[:, 3] - bboxes2[:, 1] area1 = w1 * h1 area2 = w2 * h2 center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2 center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2 center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2 center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2 inter_max_xy = torch.min(bboxes1[:, 2:], bboxes2[:, 2:]) inter_min_xy = torch.max(bboxes1[:, :2], bboxes2[:, :2]) out_max_xy = torch.max(bboxes1[:, 2:], bboxes2[:, 2:]) out_min_xy = torch.min(bboxes1[:, :2], bboxes2[:, :2]) inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) inter_area = inter[:, 0] * inter[:, 1] # 交集 inter_diag = (center_x2 - center_x1) ** 2 + (center_y2 - center_y1) ** 2 outer = torch.clamp((out_max_xy - out_min_xy), min=0) outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2) union = area1 + area2 - inter_area # 並集 dious = inter_area / union - (inter_diag) / outer_diag dious = torch.clamp(dious, min=-1.0, max=1.0) if exchange: dious = dious.T return dious def bbox_diou_np(boxes1, boxes2, normaled=False): inter_diag = np.sum(np.square(boxes1[..., :2] - boxes2[..., :2]), axis=1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = np.concatenate([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = np.concatenate([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = np.concatenate([np.minimum(boxes1[..., :2], boxes1[..., 2:]), np.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = np.concatenate([np.minimum(boxes2[..., :2], boxes2[..., 2:]), np.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = np.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = np.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 計算兩個邊界框之間的 iou 值 iou = inter_area / union_area # 計算最小閉合凸面 C 左上角和右下角的坐標 enclose_left_up = np.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = np.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) # 根據 DIoU 公式計算 DIoU 值 diou = iou - 1.0 * inter_diag / outer_diag diou = np.clip(diou, a_min=-1.0, a_max=1.0) return diou def bbox_diou_tf(boxes1, boxes2): inter_diag = tf.reduce_sum(tf.square(boxes1[..., :2] - boxes2[..., :2]), axis=1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]), tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]), tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = tf.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 計算兩個邊界框之間的 iou 值 iou = inter_area / union_area # 計算最小閉合凸面 C 左上角和右下角的坐標 # 計算最小閉合凸面 C 左上角和右下角的坐標 enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) # 根據 GIoU 公式計算 GIoU 值 diou = iou - 1.0 * inter_diag / outer_diag diou = tf.clip_by_value(diou, clip_value_min=-1.0, clip_value_max=1.0) return diou # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/65b68b53f73173397937d4950ff916a41545c960/utils/box/box_utils.py#L47 def bbox_ciou_torch(bboxes1, bboxes2, normaled=False): bboxes1, bboxes2 = torch.from_numpy(bboxes1).float(), torch.from_numpy(bboxes2).float() rows = bboxes1.shape[0] cols = bboxes2.shape[0] cious = torch.zeros((rows, cols)) if rows * cols == 0: return cious exchange = False if bboxes1.shape[0] > bboxes2.shape[0]: bboxes1, bboxes2 = bboxes2, bboxes1 cious = torch.zeros((cols, rows)) exchange = True w1 = bboxes1[:, 2] - bboxes1[:, 0] h1 = bboxes1[:, 3] - bboxes1[:, 1] w2 = bboxes2[:, 2] - bboxes2[:, 0] h2 = bboxes2[:, 3] - bboxes2[:, 1] area1 = w1 * h1 area2 = w2 * h2 center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2 center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2 center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2 center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2 inter_max_xy = torch.min(bboxes1[:, 2:], bboxes2[:, 2:]) inter_min_xy = torch.max(bboxes1[:, :2], bboxes2[:, :2]) out_max_xy = torch.max(bboxes1[:, 2:], bboxes2[:, 2:]) out_min_xy = torch.min(bboxes1[:, :2], bboxes2[:, :2]) inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) inter_area = inter[:, 0] * inter[:, 1] inter_diag = (center_x2 - center_x1) ** 2 + (center_y2 - center_y1) ** 2 outer = torch.clamp((out_max_xy - out_min_xy), min=0) outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2) union = area1 + area2 - inter_area u = (inter_diag) / outer_diag iou = inter_area / union with torch.no_grad(): arctan = torch.atan(w2 / h2) - torch.atan(w1 / h1) v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(w2 / h2) - torch.atan(w1 / h1)), 2) S = 1 - iou alpha = v / (S + v) w_temp = 2 * w1 distance = w1 ** 2 + h1 ** 2 ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = torch.clamp(cious, min=-1.0, max=1.0) if exchange: cious = cious.T return cious def bbox_ciou_np(boxes1, boxes2, normaled=False): w1, h1 = boxes1[..., 2], boxes1[..., 3] w2, h2 = boxes2[..., 2], boxes2[..., 3] inter_diag = np.sum(np.square(boxes1[..., :2] - boxes2[..., :2]), axis=-1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = np.concatenate([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = np.concatenate([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = np.concatenate([np.minimum(boxes1[..., :2], boxes1[..., 2:]), np.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = np.concatenate([np.minimum(boxes2[..., :2], boxes2[..., 2:]), np.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = np.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = np.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 計算兩個邊界框之間的 iou 值 iou = inter_area / union_area # 計算最小閉合凸面 C 左上角和右下角的坐標 enclose_left_up = np.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = np.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) u = (inter_diag) / outer_diag # 根據 CIoU 公式計算 CIoU 值 arctan = np.arctan(w2 / h2) - np.arctan(w1 / h1) v = (4 / (math.pi ** 2)) * np.square(np.arctan(w2 / h2) - np.arctan(w1 / h1)) S = 1 - iou alpha = v / (S + v) w_temp = 2 * w1 distance = w1 ** 2 + h1 ** 2 ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = np.clip(cious, a_min=-1.0, a_max=1.0) return cious def bbox_ciou_tf(boxes1, boxes2, normaled=False): w1, h1 = boxes1[..., 2], boxes1[..., 3] w2, h2 = boxes2[..., 2], boxes2[..., 3] inter_diag = tf.reduce_sum(tf.square(boxes1[..., :2] - boxes2[..., :2]), axis=-1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]), tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]), tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = tf.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 計算兩個邊界框之間的 iou 值 iou = inter_area / union_area # 計算最小閉合凸面 C 左上角和右下角的坐標 # 計算最小閉合凸面 C 左上角和右下角的坐標 enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) u = (inter_diag) / outer_diag # 根據 CIoU 公式計算 CIoU 值 # arctan = tf.atan(w2 / h2) - tf.atan(w1 / h1) # v = (4 / (math.pi ** 2)) * np.square(tf.atan(w2 / h2) - tf.atan(w1 / h1)) arctan = tf.atan(w2 / (h2 + epsilon)) - tf.atan(w1 / (h1 + epsilon)) v = (4 / (math.pi ** 2)) * np.square(tf.atan(w2 / (h2 + epsilon)) - tf.atan(w1 / (h1 + epsilon))) S = 1 - iou alpha = tf.stop_gradient(v / (S + v)) w_temp = tf.stop_gradient(2 * w1) distance = tf.stop_gradient(w1 ** 2 + h1 ** 2 + epsilon) ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = tf.clip_by_value(cious, clip_value_min=-1.0, clip_value_max=1.0) return cious img_width = 480.0 img_height = 320.0 gt_bboxes_xyxy = np.array([[50, 40, 200, 200], [270, 70, 400, 180]]) # xyxy pre_bboxes_xyxy = np.array([[100, 100, 250, 300], [400, 180, 460, 300]]) # xyxy gt_bboxes_xyxy_nomal = np.zeros(shape=gt_bboxes_xyxy.shape, dtype=np.float) pre_bboxes_xyxy_nomal = np.zeros(shape=pre_bboxes_xyxy.shape, dtype=np.float) gt_bboxes_xyxy_nomal[..., 0::2] = gt_bboxes_xyxy[..., 0::2] / img_width gt_bboxes_xyxy_nomal[..., 1::2] = gt_bboxes_xyxy[..., 1::2] / img_height pre_bboxes_xyxy_nomal[..., 0::2] = pre_bboxes_xyxy[..., 0::2] / img_width pre_bboxes_xyxy_nomal[..., 1::2] = pre_bboxes_xyxy[..., 1::2] / img_height gt_bboxes_xywh = np.array([[125, 120, 150, 160], [335, 125, 130, 110]]) # xywh pre_bboxes_xywh = np.array([[175, 200, 150, 200], [430, 240, 60, 120]]) # xywh gt_bboxes_xywh_nomal = np.zeros(shape=gt_bboxes_xywh.shape, dtype=np.float) pre_bboxes_xywh_nomal = np.zeros(shape=pre_bboxes_xywh.shape, dtype=np.float) gt_bboxes_xywh_nomal[..., 0::2] = gt_bboxes_xywh[..., 0::2] / img_width gt_bboxes_xywh_nomal[..., 1::2] = gt_bboxes_xywh[..., 1::2] / img_height pre_bboxes_xywh_nomal[..., 0::2] = pre_bboxes_xywh[..., 0::2] / img_width pre_bboxes_xywh_nomal[..., 1::2] = pre_bboxes_xywh[..., 1::2] / img_height # ================================================================ # fig = plt.figure() ax = fig.add_subplot(111) currentAxis = plt.gca() for idx, (gt, pt) in enumerate(zip(gt_bboxes_xywh, pre_bboxes_xywh)): iou = IoU(gt, pt, True) giou = GIoU(gt, pt, True) diou = DIoU(gt, pt, True) ciou = CIoU(gt, pt, True) currentAxis.text(gt[0] - gt[2] / 2, 20, 'iou={:.4f}, giou={:.4f}'.format(iou, giou), bbox={'facecolor': 'yellow', 'alpha': 0.5}) currentAxis.text(gt[0] - gt[2] / 2, gt[1] + gt[3] / 2 + 20, 'diou={:.4f}, ciou={:.4f}'.format(diou, ciou), bbox={'facecolor': 'yellow', 'alpha': 0.5}) currentAxis.add_patch(plt.Rectangle((gt[0]-gt[2]/2,gt[1]-gt[3]/2),gt[2],gt[3], fill=False, edgecolor='green', linewidth=2)) currentAxis.text(gt[0]-gt[2]/2,gt[1]-gt[3]/2, 'g{}'.format(idx), bbox={'facecolor': 'green', 'alpha': 0.5}) currentAxis.add_patch(plt.Rectangle((pt[0]-pt[2]/2, pt[1]-pt[3]/2), pt[2], pt[3], fill=False, edgecolor='red', linewidth=2)) currentAxis.text(pt[0]-pt[2]/2, pt[1]-pt[3]/2, 'p{}'.format(idx), bbox={'facecolor': 'red', 'alpha': 0.5}) plt.xticks(np.arange(0, img_width+1, 40)) plt.yticks(np.arange(0, img_height+1, 40)) currentAxis.invert_yaxis() plt.show() # ================================================================ # import tensorflow as tf import torch label_bbox = tf.placeholder(dtype=tf.float32, name='label_bbox') predic_bbox = tf.placeholder(dtype=tf.float32, name='predic_bbox') label_bbox_normal = tf.placeholder(dtype=tf.float32, name='label_bbox_normal') predic_bbox_normal = tf.placeholder(dtype=tf.float32, name='predic_bbox_normal') # ================================================================ # # GIoU # # ================================================================ # gious = np.expand_dims(bbox_giou_np(gt_bboxes_xywh, pre_bboxes_xywh), axis=-1) print('numpy publish giou: ', gious) # ================================================================ # gious = tf.expand_dims(bbox_giou_tf(predic_bbox, label_bbox), axis=-1) with tf.Session() as sess: result = sess.run(gious, feed_dict={label_bbox: gt_bboxes_xywh, predic_bbox: pre_bboxes_xywh} ) print('tensorflow publish giou: ', result) # ================================================================ # gious = bbox_giou_torch(gt_bboxes_xywh, pre_bboxes_xywh).unsqueeze(-1) print('pytorch publish goiu: ', gious.numpy()) # ================================================================ # # DIoU # # ================================================================ # dious = np.expand_dims(bbox_diou_np(gt_bboxes_xywh, pre_bboxes_xywh), axis=-1) print('numpy publish diou : ', dious) # ================================================================ dious = bbox_diou_torch(gt_bboxes_xyxy, pre_bboxes_xyxy).unsqueeze(-1) print('pytorch publish diou: ', dious.numpy()) # ================================================================ label_bbox = tf.placeholder(dtype=tf.float32, name='label_bbox') predic_bbox = tf.placeholder(dtype=tf.float32, name='predic_bbox') dious = tf.expand_dims(bbox_diou_tf(label_bbox, predic_bbox), axis=-1) with tf.Session() as sess: result = sess.run(dious, feed_dict={label_bbox: gt_bboxes_xywh, predic_bbox: pre_bboxes_xywh}) print('tensorflow publish diou: ', result) # ================================================================ # # CIoU # # ================================================================ # cious = bbox_ciou_torch(gt_bboxes_xyxy, pre_bboxes_xyxy, False).unsqueeze(-1) print('pytorch publish ciou unnormaled: ', cious.numpy()) cious = bbox_ciou_torch(gt_bboxes_xyxy_nomal, pre_bboxes_xyxy_nomal, True).unsqueeze(-1) print('pytorch publish ciou normaled: ', cious.numpy()) # ================================================================ # cious = np.expand_dims(bbox_ciou_np(gt_bboxes_xywh, pre_bboxes_xywh, False), axis=-1) print('numpy publish ciou unnormaled: ', cious) cious = np.expand_dims(bbox_ciou_np(gt_bboxes_xywh_nomal, pre_bboxes_xywh_nomal, True), axis=-1) print('numpy publish ciou normaled: ', cious) # ================================================================ # cious = tf.expand_dims(bbox_ciou_tf(label_bbox, predic_bbox, False), axis=-1) cious_normal = tf.expand_dims(bbox_ciou_tf(label_bbox_normal, predic_bbox_normal, True), axis=-1) with tf.Session() as sess: cious_tf, cious_tf_normal = sess.run([cious, cious_normal], feed_dict={label_bbox_normal: gt_bboxes_xywh_nomal, predic_bbox_normal: pre_bboxes_xywh_nomal, label_bbox: gt_bboxes_xywh, predic_bbox: pre_bboxes_xywh}) print('tensorflow publish ciou unnormaled:', cious_tf) print('tensorflow publish ciou normaled: ', cious_tf_normal) # ================================================================ #
numpy publish giou: [[ 0.07342657] [-0.50800915]] tensorflow publish giou: [[ 0.07342657] [-0.50800914]] pytorch publish goiu: [[ 0.07342657] [-0.50800914]] numpy publish diou : [[ 0.14455897] [-0.25 ]] pytorch publish diou: [[ 0.14455898] [-0.25 ]] tensorflow publish diou: [[ 0.14455898] [-0.25 ]] pytorch publish ciou unnormaled: [[ 0.14428109] [-0.2600825 ]] pytorch publish ciou normaled: [[ 0.1392411 ] [-0.25120372]] numpy publish ciou unnormaled: [[ 0.14428107] [-0.26008251]] numpy publish ciou normaled: [[ 0.13924112] [-0.25120372]] tensorflow publish ciou unnormaled: [[ 0.14428109] [-0.2600825 ]] tensorflow publish ciou normaled: [[ 0.13924108] [-0.25120363]]
同事實驗下來:
method | GIoU | DIoU | CIoU |
mAP | 81.37% | 81.46% | 82.36% |