FCOS官方代码详解(一):Architecture[backbone]
关于FCOS的论文讲解网上也有挺多了,但是从代码角度理解的不多,最近看了一下,想把自己的理解写出来,记录一下,可以忘记后又来看一眼,代码能理解,论文肯定能理解。个人还是比较喜欢这种one-stage的anchor-free的方法,简单,好理解,🤭。不要着急,本片有点长,刚开始接触一天能看完就不错了~~
论文理解:FCOS: Fully Convolutional One-Stage Object Detection 【国内镜像】
官方源码:https://github.com/tianzhi0549/FCOS【基于maskrcnn-benchmark】
放一篇博客吧:FCOS: 最新的one-stage逐像素目标检测算法
从论文里面可以看出,整个architecture就是由三部分构成的:Backbone、FPN、Head(里面又分成Classification,Center-ness,Regression三个分支)
现在就来看源码里面关于这三部分是怎么构造的吧:
tools/train_net.py
我们按着程序运行流向来逐个理解,最终形成整个pipeline的印象。这是官方repo中的README.md中训练部分的命令:
python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=$((RANDOM + 10000)) \
tools/train_net.py \
--config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml \
DATALOADER.NUM_WORKERS 2 \
OUTPUT_DIR training_dir/fcos_imprv_R_50_FPN_1x
- 咋一看怎么这个都看不懂,瞬间信心失掉一半,还是硬着头皮看吧,前三个参数都是有关分布式训练的,因为我只有单卡,所以我先没有管这三个参数,如果有多卡的,可以去看这一篇相似的高质量教程:Pytorch中多GPU训练指北。关于python -m的用法,可看这一篇:python -m是拿来干啥用的?
- 这里还是啰一嘴:
python -m
使得torch.distributed.launch.py能像模块一样运行,因为分布式用的DistributedDataParallel,torch.distributed.launch为我们触发了n个train_net.py进程,nproc_per_node和master_port都是torch.distributed.launch.py的命令行参数。该文件位于miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py
- 训练入口就是
tools/train_net.py
,之后在configs/fcos/
下有很多.yaml
后缀的配置文件,就像json,xml等文件,只不过需要利用yacs这个包进行读入,后面的DATALOADER.NUM_WORKERS和OUTPUT_DIR就是配置文件里面的某些项,这个等一下还要讲,不过建议先去看一下rgb大神写的yacs的说明:项目地址,就先看一下README就行,因为配置文件不是我们的重点。
main()
按着调用关系,最先来到的就是main()方法:
def main(): # 这个就是解析命令行参数,如上面的--config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml parser = argparse.ArgumentParser(description="PyTorch Object Detection Training") parser.add_argument( "--config-file", default="", metavar="FILE", help="path to config file", type=str, ) # 这个参数是torch.distributed.launch传递过来的,我们设置位置参数来接受 # local_rank代表当前程序进程使用的GPU标号 parser.add_argument( "--local_rank", type=int, default=0, help="local_rank is used by torch.distributed.launch to leverage multiple GPUs", ) parser.add_argument( "--skip-test", dest="skip_test", help="Do not test the final model", action="store_true", ) parser.add_argument( "opts", help="Modify config options using the command-line", default=None, nargs=argparse.REMAINDER, ) args = parser.parse_args() # 判断机器上GPU的数量,大于1时自动使用分布式训练 # WORLD_SIZE 由torch.distributed.launch.py产生 # 具体数值为 nproc_per_node*node(node就是主机数) num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1 args.distributed = num_gpus > 1 if args.distributed: # 因为我没有多卡,我就没管这个 torch.cuda.set_device(args.local_rank) # 这些都是分布式训练需要的,local_rank用于这里 torch.distributed.init_process_group( backend="nccl", init_method="env://" ) synchronize() # 参数默认是在fcos_core/config/defaults.py中,其余由config_file,opts覆盖 cfg.merge_from_file(args.config_file) # 从yaml文件中读取参数 cfg.merge_from_list(args.opts) # 也可以从命令行参数重写 cfg.freeze() # 冻住参数,为了防止之后不小心被更改,cfg被传入train() # 可以在这里打印cfg看看,我以fcos_R_50_FPN_1x.yaml为例 output_dir = cfg.OUTPUT_DIR # 创建输出文件夹,存放一些日志信息 if output_dir: mkdir(output_dir) # 写入日志文件,包括GPU数量,系统环境,配置文件参数等 logger = setup_logger("fcos_core", output_dir, get_rank()) logger.info("Using {} GPUs".format(num_gpus)) logger.info(args) logger.info("Collecting env info (might take some time)") logger.info("\n" + collect_env_info()) logger.info("Loaded configuration file {}".format(args.config_file)) with open(args.config_file, "r") as cf: config_str = "\n" + cf.read() logger.info(config_str) logger.info("Running with config:\n{}".format(cfg)) # 这句话是下一个入口,关注train()方法,里面第一步就是构建模型 model = train(cfg, args.local_rank, args.distributed) # cfg, 0, 0 if not args.skip_test: # 如果不跳过test,那就ran一下它 run_test(cfg, model, args.distributed)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
我们可以看到config-file
和local_rank
前面都有--
表明他们是可选参数(optional arguments),但是opts
没有,意味这他是位置参数(positional arguments),而nargs=argparse.REMAINDER
表明:所有剩余的命令行参数都被收集到一个列表中(这通常用于命令行工具分发命令到其它命令行工具),所以args.opts
就传入到了cfg.merge_from_list()
中
train()
下面要进入的就是train()方法,在main中被这样调用:
model = train(cfg, args.local_rank, args.distributed) # cfg, 0, 0 返回一个model
model = build_detection_model(cfg) # 本篇只写到模型的构建,所以只用看到这一句
fcos_core/modeling/detector/detectors.py
train中的build_detection_model(cfg) 指向这个文件:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. from .generalized_rcnn import GeneralizedRCNN _DETECTION_META_ARCHITECTURES = {"GeneralizedRCNN": GeneralizedRCNN} def build_detection_model(cfg): meta_arch = _DETECTION_META_ARCHITECTURES[cfg.MODEL.META_ARCHITECTURE] return meta_arch(cfg)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
从打印出来的cfg可以看出:
MODEL: ... META_ARCHITECTURE: GeneralizedRCNN
- 1
- 2
- 3
build_detection_model返回的就是GeneralizedRCNN(cfg)
fcos_core/modeling/detector/generalized_rcnn.py
generalized_rcnn.py这个模块就是Implements the Generalized R-CNN framework
,当然作者没有改变太多maskrcnn-benchmark的代码,一开始看还很好奇,怎么FCOS里面还有roi的,后来发现其实是没用的。当然这里的rpn也不是R-CNN framework中的rpn,而是FCOS的Head。所以看下build_backbone和build_rpn
class GeneralizedRCNN(nn.Module): """ Main class for Generalized R-CNN. Currently supports boxes and masks. It consists of three main parts: - backbone - rpn - heads: takes the features + the proposals from the RPN and computes detections / masks from it. """ def __init__(self, cfg): super(GeneralizedRCNN, self).__init__() self.backbone = build_backbone(cfg) #返回的是一个nn.Sequential的model,其中FPN出来的就是那多层特征 self.rpn = build_rpn(cfg, self.backbone.out_channels) # 这里就是FCOS头部 self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels) # 这里roi_heads是空列表 def forward(self, images, targets=None): pass # 这里暂时不用,所以我pass掉了
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
build_backbone()
build_backbone是fcos_core/modeling/backbone/backbone.py
下的一个函数,我这里是以:fcos_R_50_FPN_1x.yaml为例(旁边标的参数都是这个里面的,下不赘述),因为其中的CONV_BODY
是 R-50-FPN-RETINANET
,所以我只关注 R-50-FPN-RETINANET
的注册(register),当然这个会了,其他的如 R-50-C4
的注册都没有问题:
from collections import OrderedDict from torch import nn from fcos_core.modeling import registry from fcos_core.modeling.make_layers import conv_with_kaiming_uniform from . import fpn as fpn_module from . import resnet from . import mobilenet @registry.BACKBONES.register("R-50-FPN-RETINANET") @registry.BACKBONES.register("R-101-FPN-RETINANET") def build_resnet_fpn_p3p7_backbone(cfg): # 为节省篇幅,方便阅读,这里就先不放这一函数内容,下面会再说 return model def build_backbone(cfg): # 如果CONV_BOD不在registry.BACKBONES中就抛出异常 assert cfg.MODEL.BACKBONE.CONV_BODY in registry.BACKBONES, \ "cfg.MODEL.BACKBONE.CONV_BODY: {} are not registered in registry".format( cfg.MODEL.BACKBONE.CONV_BODY ) return registry.BACKBONES[cfg.MODEL.BACKBONE.CONV_BODY](cfg) # usage of decorator # registry.BACKBONES[cfg.MODEL.BACKBONE.CONV_BODY] ==> 指代build_resnet_fpn_p3p7_backbone() # 所以后面加一个参数:cfg
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- build_backbone下面首先就是个assert语句,
CONV_BODY-->R-50-FPN-RETINANET
是一个字符串,registry.BACKBONES
是fcos_core/modeling/registry.py下的Registry类实例化的一个对象,该类定义在fcos_core/utils/registry.py中,继承了dict类,所以registry.BACKBONES
也有字典的用法,上面的【11行】用到了装饰器(decorator)的语法:@registry.BACKBONES.register("R-50-FPN-RETINANET")
,这个往下看:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. def _register_generic(module_dict, module_name, module): assert module_name not in module_dict module_dict[module_name] = module class Registry(dict): ''' A helper class for managing registering modules, it extends a dictionary and provides a register functions. Eg. creeting a registry: some_registry = Registry({"default": default_module}) There're two ways of registering new modules: 1): normal way is just calling register function: def foo(): ... some_registry.register("foo_module", foo) 2): used as decorator when declaring the module: @some_registry.register("foo_module") @some_registry.register("foo_modeul_nickname") def foo(): ... Access of module is just like using a dictionary, eg: f = some_registry["foo_modeul"] ''' def __init__(self, *args, **kwargs): super(Registry, self).__init__(*args, **kwargs) def register(self, module_name, module=None): # used as function call if module is not None: _register_generic(self, module_name, module) return # used as decorator def register_fn(fn): _register_generic(self, module_name, fn) return fn return register_fn
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 可以看到在类Registry()中有定义返回函数register_fn(fn)的函数register,那么这样就可以被当作装饰器使用(关于装饰器的语法可以参考这个:装饰器)。那么当
@registry.BACKBONES.register("R-50-FPN-RETINANET")
这句话“装饰”了build_resnet_fpn_p3p7_backbone
这个函数的时候,就完成了一个键值对的写入:registry.BACKBONES[“R-50-FPN-RETINANET”]=build_resnet_fpn_p3p7_backbone,所以build_backbone(cfg)
返回的就是build_resnet_fpn_p3p7_backbone(cfg)
,那我们就来看一下这个函数是怎么构造backbone的:
@registry.BACKBONES.register("R-50-FPN-RETINANET") @registry.BACKBONES.register("R-101-FPN-RETINANET") def build_resnet_fpn_p3p7_backbone(cfg): body = resnet.ResNet(cfg) # 获取 fpn 所需的channels参数 in_channels_stage2 = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS # 256 out_channels = cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS # 256 in_channels_p6p7 = in_channels_stage2 * 8 if cfg.MODEL.RETINANET.USE_C5 \ else out_channels fpn = fpn_module.FPN( in_channels_list=[ 0, in_channels_stage2 * 2, in_channels_stage2 * 4, in_channels_stage2 * 8, ], out_channels=out_channels, conv_block=conv_with_kaiming_uniform( # 这个conv如果stride=1的话就不变size,返回的是一个函数 cfg.MODEL.FPN.USE_GN, cfg.MODEL.FPN.USE_RELU ), top_blocks=fpn_module.LastLevelP6P7(in_channels_p6p7, out_channels), ) # 通过有序字典将body和fpn送入nn.Sequential构造模型 model = nn.Sequential(OrderedDict([("body", body), ("fpn", fpn)])) # 写成一个,body的输出作为fpn的输入 # 这个是为了之后有用,再赋一次值 model.out_channels = out_channels return model
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- build_resnet_fpn_p3p7_backbone顾名思义,就是构造上述论文图中的backbone部分,并且FPN部分从P3到P7。这里分别由resnet.ResNet产生body,由fpn_module.FPN产生fpn部分。下面依次来看这两部分:
resnet.py
下面就是来构造一个resnet了,这在torchvision里面都有实现,这里也比较类似,如果之前看过torchvision里面的resnet代码,这里就会非常好理解。先来看一些配置内容和较容易理解的部分:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. """ Variant of the resnet module that takes cfg as an argument. Example usage. Strings may be specified in the config file. model = ResNet( "StemWithFixedBatchNorm", "BottleneckWithFixedBatchNorm", "ResNet50StagesTo4", ) OR: model = ResNet( "StemWithGN", "BottleneckWithGN", "ResNet50StagesTo4", ) Custom implementations may be written in user code and hooked in via the `register_*` functions. """ # 上面是一个使用说明,下面先导入一些必要包 from collections import namedtuple import torch import torch.nn.functional as F from torch import nn from fcos_core.layers import FrozenBatchNorm2d from fcos_core.layers import Conv2d from fcos_core.layers import DFConv2d from fcos_core.modeling.make_layers import group_norm from fcos_core.utils.registry import Registry # ResNet stage specification 通过一个命名元组来设定resnet各阶段的参数 StageSpec = namedtuple( "StageSpec", [ "index", # Index of the stage, eg 1, 2, ..,. 5 "block_count", # Number of residual blocks in the stage "return_features", # True => return the last feature map from this stage ], ) # ----------------------------------------------------------------------------- # Standard ResNet models # ----------------------------------------------------------------------------- # 下面这些元组会通过_STAGE_SPECS[cfg.MODEL.BACKBONE.CONV_BODY]来选定,我只放了resnet50的 # ResNet-50 (including all stages) ResNet50StagesTo5 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, False), (4, 3, True)) ) # ResNet-50 up to stage 4 (excludes stage 5)只使用到第四阶段输出的特征图 ResNet50StagesTo4 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, True)) ) # ResNet-50-FPN (including all stages)由于fpn需要用到每一个阶段输出的特征图, 故return_features参数均为True ResNet50FPNStagesTo5 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 6, True), (4, 3, True)) ) # 这个指定resnet的Bottleneck结构用FixedBatchNorm还是GroupNorm _TRANSFORMATION_MODULES = Registry({ "BottleneckWithFixedBatchNorm": BottleneckWithFixedBatchNorm, "BottleneckWithGN": BottleneckWithGN, }) # 这个指定resnet的Stem结构用FixedBatchNorm还是GroupNorm _STEM_MODULES = Registry({ "StemWithFixedBatchNorm": StemWithFixedBatchNorm, "StemWithGN": StemWithGN, }) # 这个指定具体构建resnet的哪个深度的模型,并且到第几个stage _STAGE_SPECS = Registry({ "R-50-C4": ResNet50StagesTo4, "R-50-C5": ResNet50StagesTo5, "R-101-C4": ResNet101StagesTo4, "R-101-C5": ResNet101StagesTo5, "R-50-FPN": ResNet50FPNStagesTo5, "R-50-FPN-RETINANET": ResNet50FPNStagesTo5, "R-101-FPN": ResNet101FPNStagesTo5, "R-101-FPN-RETINANET": ResNet101FPNStagesTo5