mmdetection:各种各样的anchor生成方式及其标签分配assigner(1)


在anchor_generator.py中,集成了很多中anchor的生成方式,趁阅读源码mmdetection之际,对其进行一下总结

(先总结下faster rcnn, yolov3, SSD,后续继续补充)

一、anchor生成

         生成anchor的总体思路是,首先生成base_anchor,然后网格化(meshgrid)生成其他anchor。

        1、faster rcnn

               faster rcnn的anchor生成是最经典的,其他anchor生成方式与之相比大同小异。在anchor_generator中,可以

看到它是在没有使用for循环的情况下,如何生成的anchor的。

              首先是base_anchor。以(0,0)为左上角,以(stride, stride)为基本(w, h),分别与scale,ratio计算得到的多个

anchor。比如scale = [8, 16, 32](w,h的大小)  , ratios=[0.5, 1.0, 2.0](w和h的比例), 那就是生成9个anchor。

 1 def gen_single_level_base_anchors(self,  2  base_size,  3  scales,  4  ratios,  5                                       center=None):  6         """Generate base anchors of a single level.  7 
 8  Args:  9  base_size (int | float): Basic size of an anchor. 10  scales (torch.Tensor): Scales of the anchor. 11  ratios (torch.Tensor): The ratio between between the height 12  and width of anchors in a single level. 13  center (tuple[float], optional): The center of the base anchor 14  related to a single feature grid. Defaults to None. 15 
16  Returns: 17  torch.Tensor: Anchors in a single-level feature maps. 18         """
19         w = base_size 20         h = base_size 21         if center is None: 22             x_center = self.center_offset * w 23             y_center = self.center_offset * h 24         else: 25             x_center, y_center = center 26 
27         h_ratios = torch.sqrt(ratios) 28         w_ratios = 1 / h_ratios 29         if self.scale_major: 30             ws = (w * w_ratios[:, None] * scales[None, :]).view(-1) 31             hs = (h * h_ratios[:, None] * scales[None, :]).view(-1) 32         else: 33             ws = (w * scales[:, None] * w_ratios[None, :]).view(-1) 34             hs = (h * scales[:, None] * h_ratios[None, :]).view(-1) 35 
36         # use float anchor and the anchor's center is aligned with the
37         # pixel center
38         base_anchors = [ 39             x_center - 0.5 * ws, y_center - 0.5 * hs, x_center + 0.5 * ws, 40             y_center + 0.5 * hs 41  ] 42         base_anchors = torch.stack(base_anchors, dim=-1) 43 
44         return base_anchors

          有了base_anchor,那就只需在其他位置上,对base_anchor进行相应的偏移即可。因此,先通过meshgrid,生成各个位置,然后加上base_anchor。

 1 def _meshgrid(self, x, y, row_major=True):  2         """Generate mesh grid of x and y.  3 
 4  Args:  5  x (torch.Tensor): Grids of x dimension.  6  y (torch.Tensor): Grids of y dimension.  7  row_major (bool, optional): Whether to return y grids first.  8  Defaults to True.  9 
10  Returns: 11  tuple[torch.Tensor]: The mesh grids of x and y. 12         """
13         xx = x.repeat(len(y)) 14         yy = y.view(-1, 1).repeat(1, len(x)).view(-1) 15         if row_major: 16             return xx, yy 17         else: 18             return yy, xx 19 
20 def single_level_grid_anchors(self, 21  base_anchors, 22  featmap_size, 23                                   stride=(16, 16), 24                                   device='cuda'): 25         """Generate grid anchors of a single level. 26 
27  Note: 28  This function is usually called by method ``self.grid_anchors``. 29 
30  Args: 31  base_anchors (torch.Tensor): The base anchors of a feature grid. 32  featmap_size (tuple[int]): Size of the feature maps. 33  stride (tuple[int], optional): Stride of the feature map in order 34  (w, h). Defaults to (16, 16). 35  device (str, optional): Device the tensor will be put on. 36  Defaults to 'cuda'. 37 
38  Returns: 39  torch.Tensor: Anchors in the overall feature maps. 40         """
41         feat_h, feat_w = featmap_size 42         # convert Tensor to int, so that we can covert to ONNX correctlly
43         feat_h = int(feat_h) 44         feat_w = int(feat_w) 45         shift_x = torch.arange(0, feat_w, device=device) * stride[0] 46         shift_y = torch.arange(0, feat_h, device=device) * stride[1] 47 
48         shift_xx, shift_yy = self._meshgrid(shift_x, shift_y) 49         shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1) 50         shifts = shifts.type_as(base_anchors) 51         # first feat_w elements correspond to the first row of shifts
52         # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
53         # shifted anchors (K, A, 4), reshape to (K*A, 4)
54 
55         all_anchors = base_anchors[None, :, :] + shifts[:, None, :] 56         all_anchors = all_anchors.view(-1, 4) 57         # first A rows correspond to A anchors of (0, 0) in feature map,
58         # then (0, 1), (0, 2), ...
59         return all_anchors

             2、yolov2&yolov3

                  和faster rcnn的不同就是base_anchor。yolo的base_anchor是通过对数据集聚类得到的。如下,可以看到这里不需要在去

计算scale和ratio,剩下的就是网格化生成其余anchor。

 1 def gen_single_level_base_anchors(self, base_sizes_per_level, center=None):  2         """Generate base anchors of a single level.  3 
 4  Args:  5  base_sizes_per_level (list[tuple[int, int]]): Basic sizes of  6  anchors.  7  center (tuple[float], optional): The center of the base anchor  8  related to a single feature grid. Defaults to None.  9 
10  Returns: 11  torch.Tensor: Anchors in a single-level feature maps. 12         """
13         x_center, y_center = center 14         base_anchors = [] 15         for base_size in base_sizes_per_level: 16             w, h = base_size 17 
18             # use float anchor and the anchor's center is aligned with the
19             # pixel center
20             base_anchor = torch.Tensor([ 21                 x_center - 0.5 * w, y_center - 0.5 * h, x_center + 0.5 * w, 22                 y_center + 0.5 * h 23  ]) 24  base_anchors.append(base_anchor) 25         base_anchors = torch.stack(base_anchors, dim=0) 26 
27         return base_anchors 

            3、SSD

                         SSD也类似,不同的地方就是anchor的尺度不再是固定的,而是变化的(参考论文给的尺度公式):随着特征图减小,尺度逐渐增大(感受野大,anchor也要大)。

           剩下的就和faster rcnn一样了。参考https://zhuanlan.zhihu.com/p/33544892

 1 # 计算出在原图上,anchor大小(单边为60, 111, 162, 213, 264)
 2         min_sizes = []  3         max_sizes = []  4         for ratio in range(int(min_ratio), int(max_ratio) + 1, step):  5             min_sizes.append(int(self.input_size * ratio / 100))  6             max_sizes.append(int(self.input_size * (ratio + step) / 100))  7 
 8         # anchor再增加一个尺度30 
 9         if self.input_size == 300: 10             if basesize_ratio_range[0] == 0.15:  # SSD300 COCO
11                 min_sizes.insert(0, int(self.input_size * 7 / 100)) 12                 max_sizes.insert(0, int(self.input_size * 15 / 100)) 13 
14 # 计算其 scale ratio
15 anchor_ratios = [] 16         anchor_scales = [] 17         for k in range(len(self.strides)): 18             scales = [1., np.sqrt(max_sizes[k] / min_sizes[k])] 19             anchor_ratio = [1.] 20             for r in ratios[k]: 21                 anchor_ratio += [1 / r, r]  # 4 or 6 ratio
22  anchor_ratios.append(torch.Tensor(anchor_ratio)) 23             anchor_scales.append(torch.Tensor(scales))

二、anchor assigner

          生成了anchor之后,要对其打标签,看看哪些是正样本,哪些是负样本。

          1、MaxIouAssigner

                 SSD和faseter rcnn采用的方式,计算anchor与GT的IOU。对每个anchor,如果其IOU>pos_thread,则为正样本,如果IOU<neg_thread,则为背景类;此外,

        在代码中还有一段细节。在上述策略中,有些GT可能没有匹配到任何的anchor,因此添加了一个补救措施来扩充正样本:遍历每一个GT,

       查看iou最大的anchor,如果最大iou>pos_min_iou,那就标记为正样本。注意:这个其实并不能保证每个GT一定有anchor(跟GT遍历顺序有关),

        并且会引入一些不太好的正样本,因此效果并不一定好。这边博客举了一个不错的例子

         目标检测(MMdetection)——Retina(Anchor、Focal Loss) - 知乎 (zhihu.com)

 1 if self.match_low_quality:  2             # Low-quality matching will overwirte the assigned_gt_inds assigned
 3             # in Step 3. Thus, the assigned gt might not be the best one for
 4             # prediction.
 5             # For example, if bbox A has 0.9 and 0.8 iou with GT bbox 1 & 2,
 6             # bbox 1 will be assigned as the best target for bbox A in step 3.
 7             # However, if GT bbox 2's gt_argmax_overlaps = A, bbox A's
 8             # assigned_gt_inds will be overwritten to be bbox B.
 9             # This might be the reason that it is not used in ROI Heads.
10             for i in range(num_gts): 11                 if gt_max_overlaps[i] >= self.min_pos_iou: 12                     if self.gt_max_assign_all: 13                         max_iou_inds = overlaps[i, :] == gt_max_overlaps[i] 14                         assigned_gt_inds[max_iou_inds] = i + 1
15                     else: 16                         assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1

 

 

 

       2、GridAssigner

              YOLO采用的方式,同样先计算anchor与GT的IOU。负样本标记方式相同,不同的是正样本。对于每个anchor,其最近网格的IOU>pos_thread并且其中心落入

     该网格,则该anchor为正样本;对于每个GT,将其最近的anchor,赋值给该GT最近的格子。这就意味着,每个GT,其实只有一个正样本。

三、box编码

         在训练过程中,并非直接使用GT和anchor的坐标直接训练,为了加速收敛,会对其进行编码,编码的方式略有不同。

            这部分参考史上最详细的Yolov3边框预测分析_逍遥王的博客-CSDN博客

 

             

                                       


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM