FCOS官方代碼詳解(一):Architecture[backbone]
關於FCOS的論文講解網上也有挺多了,但是從代碼角度理解的不多,最近看了一下,想把自己的理解寫出來,記錄一下,可以忘記后又來看一眼,代碼能理解,論文肯定能理解。個人還是比較喜歡這種one-stage的anchor-free的方法,簡單,好理解,🤭。不要着急,本片有點長,剛開始接觸一天能看完就不錯了~~
論文理解:FCOS: Fully Convolutional One-Stage Object Detection 【國內鏡像】
官方源碼:https://github.com/tianzhi0549/FCOS【基於maskrcnn-benchmark】
放一篇博客吧:FCOS: 最新的one-stage逐像素目標檢測算法
從論文里面可以看出,整個architecture就是由三部分構成的:Backbone、FPN、Head(里面又分成Classification,Center-ness,Regression三個分支)
現在就來看源碼里面關於這三部分是怎么構造的吧:
tools/train_net.py
我們按着程序運行流向來逐個理解,最終形成整個pipeline的印象。這是官方repo中的README.md中訓練部分的命令:
python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=$((RANDOM + 10000)) \
tools/train_net.py \
--config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml \
DATALOADER.NUM_WORKERS 2 \
OUTPUT_DIR training_dir/fcos_imprv_R_50_FPN_1x
- 咋一看怎么這個都看不懂,瞬間信心失掉一半,還是硬着頭皮看吧,前三個參數都是有關分布式訓練的,因為我只有單卡,所以我先沒有管這三個參數,如果有多卡的,可以去看這一篇相似的高質量教程:Pytorch中多GPU訓練指北。關於python -m的用法,可看這一篇:python -m是拿來干啥用的?
- 這里還是啰一嘴:
python -m
使得torch.distributed.launch.py能像模塊一樣運行,因為分布式用的DistributedDataParallel,torch.distributed.launch為我們觸發了n個train_net.py進程,nproc_per_node和master_port都是torch.distributed.launch.py的命令行參數。該文件位於miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py
- 訓練入口就是
tools/train_net.py
,之后在configs/fcos/
下有很多.yaml
后綴的配置文件,就像json,xml等文件,只不過需要利用yacs這個包進行讀入,后面的DATALOADER.NUM_WORKERS和OUTPUT_DIR就是配置文件里面的某些項,這個等一下還要講,不過建議先去看一下rgb大神寫的yacs的說明:項目地址,就先看一下README就行,因為配置文件不是我們的重點。
main()
按着調用關系,最先來到的就是main()方法:
def main(): # 這個就是解析命令行參數,如上面的--config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml parser = argparse.ArgumentParser(description="PyTorch Object Detection Training") parser.add_argument( "--config-file", default="", metavar="FILE", help="path to config file", type=str, ) # 這個參數是torch.distributed.launch傳遞過來的,我們設置位置參數來接受 # local_rank代表當前程序進程使用的GPU標號 parser.add_argument( "--local_rank", type=int, default=0, help="local_rank is used by torch.distributed.launch to leverage multiple GPUs", ) parser.add_argument( "--skip-test", dest="skip_test", help="Do not test the final model", action="store_true", ) parser.add_argument( "opts", help="Modify config options using the command-line", default=None, nargs=argparse.REMAINDER, ) args = parser.parse_args() # 判斷機器上GPU的數量,大於1時自動使用分布式訓練 # WORLD_SIZE 由torch.distributed.launch.py產生 # 具體數值為 nproc_per_node*node(node就是主機數) num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1 args.distributed = num_gpus > 1 if args.distributed: # 因為我沒有多卡,我就沒管這個 torch.cuda.set_device(args.local_rank) # 這些都是分布式訓練需要的,local_rank用於這里 torch.distributed.init_process_group( backend="nccl", init_method="env://" ) synchronize() # 參數默認是在fcos_core/config/defaults.py中,其余由config_file,opts覆蓋 cfg.merge_from_file(args.config_file) # 從yaml文件中讀取參數 cfg.merge_from_list(args.opts) # 也可以從命令行參數重寫 cfg.freeze() # 凍住參數,為了防止之后不小心被更改,cfg被傳入train() # 可以在這里打印cfg看看,我以fcos_R_50_FPN_1x.yaml為例 output_dir = cfg.OUTPUT_DIR # 創建輸出文件夾,存放一些日志信息 if output_dir: mkdir(output_dir) # 寫入日志文件,包括GPU數量,系統環境,配置文件參數等 logger = setup_logger("fcos_core", output_dir, get_rank()) logger.info("Using {} GPUs".format(num_gpus)) logger.info(args) logger.info("Collecting env info (might take some time)") logger.info("\n" + collect_env_info()) logger.info("Loaded configuration file {}".format(args.config_file)) with open(args.config_file, "r") as cf: config_str = "\n" + cf.read() logger.info(config_str) logger.info("Running with config:\n{}".format(cfg)) # 這句話是下一個入口,關注train()方法,里面第一步就是構建模型 model = train(cfg, args.local_rank, args.distributed) # cfg, 0, 0 if not args.skip_test: # 如果不跳過test,那就ran一下它 run_test(cfg, model, args.distributed)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
我們可以看到config-file
和local_rank
前面都有--
表明他們是可選參數(optional arguments),但是opts
沒有,意味這他是位置參數(positional arguments),而nargs=argparse.REMAINDER
表明:所有剩余的命令行參數都被收集到一個列表中(這通常用於命令行工具分發命令到其它命令行工具),所以args.opts
就傳入到了cfg.merge_from_list()
中
train()
下面要進入的就是train()方法,在main中被這樣調用:
model = train(cfg, args.local_rank, args.distributed) # cfg, 0, 0 返回一個model
model = build_detection_model(cfg) # 本篇只寫到模型的構建,所以只用看到這一句
fcos_core/modeling/detector/detectors.py
train中的build_detection_model(cfg) 指向這個文件:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. from .generalized_rcnn import GeneralizedRCNN _DETECTION_META_ARCHITECTURES = {"GeneralizedRCNN": GeneralizedRCNN} def build_detection_model(cfg): meta_arch = _DETECTION_META_ARCHITECTURES[cfg.MODEL.META_ARCHITECTURE] return meta_arch(cfg)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
從打印出來的cfg可以看出:
MODEL: ... META_ARCHITECTURE: GeneralizedRCNN
- 1
- 2
- 3
build_detection_model返回的就是GeneralizedRCNN(cfg)
fcos_core/modeling/detector/generalized_rcnn.py
generalized_rcnn.py這個模塊就是Implements the Generalized R-CNN framework
,當然作者沒有改變太多maskrcnn-benchmark的代碼,一開始看還很好奇,怎么FCOS里面還有roi的,后來發現其實是沒用的。當然這里的rpn也不是R-CNN framework中的rpn,而是FCOS的Head。所以看下build_backbone和build_rpn
class GeneralizedRCNN(nn.Module): """ Main class for Generalized R-CNN. Currently supports boxes and masks. It consists of three main parts: - backbone - rpn - heads: takes the features + the proposals from the RPN and computes detections / masks from it. """ def __init__(self, cfg): super(GeneralizedRCNN, self).__init__() self.backbone = build_backbone(cfg) #返回的是一個nn.Sequential的model,其中FPN出來的就是那多層特征 self.rpn = build_rpn(cfg, self.backbone.out_channels) # 這里就是FCOS頭部 self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels) # 這里roi_heads是空列表 def forward(self, images, targets=None): pass # 這里暫時不用,所以我pass掉了
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
build_backbone()
build_backbone是fcos_core/modeling/backbone/backbone.py
下的一個函數,我這里是以:fcos_R_50_FPN_1x.yaml為例(旁邊標的參數都是這個里面的,下不贅述),因為其中的CONV_BODY
是 R-50-FPN-RETINANET
,所以我只關注 R-50-FPN-RETINANET
的注冊(register),當然這個會了,其他的如 R-50-C4
的注冊都沒有問題:
from collections import OrderedDict from torch import nn from fcos_core.modeling import registry from fcos_core.modeling.make_layers import conv_with_kaiming_uniform from . import fpn as fpn_module from . import resnet from . import mobilenet @registry.BACKBONES.register("R-50-FPN-RETINANET") @registry.BACKBONES.register("R-101-FPN-RETINANET") def build_resnet_fpn_p3p7_backbone(cfg): # 為節省篇幅,方便閱讀,這里就先不放這一函數內容,下面會再說 return model def build_backbone(cfg): # 如果CONV_BOD不在registry.BACKBONES中就拋出異常 assert cfg.MODEL.BACKBONE.CONV_BODY in registry.BACKBONES, \ "cfg.MODEL.BACKBONE.CONV_BODY: {} are not registered in registry".format( cfg.MODEL.BACKBONE.CONV_BODY ) return registry.BACKBONES[cfg.MODEL.BACKBONE.CONV_BODY](cfg) # usage of decorator # registry.BACKBONES[cfg.MODEL.BACKBONE.CONV_BODY] ==> 指代build_resnet_fpn_p3p7_backbone() # 所以后面加一個參數:cfg
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- build_backbone下面首先就是個assert語句,
CONV_BODY-->R-50-FPN-RETINANET
是一個字符串,registry.BACKBONES
是fcos_core/modeling/registry.py下的Registry類實例化的一個對象,該類定義在fcos_core/utils/registry.py中,繼承了dict類,所以registry.BACKBONES
也有字典的用法,上面的【11行】用到了裝飾器(decorator)的語法:@registry.BACKBONES.register("R-50-FPN-RETINANET")
,這個往下看:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. def _register_generic(module_dict, module_name, module): assert module_name not in module_dict module_dict[module_name] = module class Registry(dict): ''' A helper class for managing registering modules, it extends a dictionary and provides a register functions. Eg. creeting a registry: some_registry = Registry({"default": default_module}) There're two ways of registering new modules: 1): normal way is just calling register function: def foo(): ... some_registry.register("foo_module", foo) 2): used as decorator when declaring the module: @some_registry.register("foo_module") @some_registry.register("foo_modeul_nickname") def foo(): ... Access of module is just like using a dictionary, eg: f = some_registry["foo_modeul"] ''' def __init__(self, *args, **kwargs): super(Registry, self).__init__(*args, **kwargs) def register(self, module_name, module=None): # used as function call if module is not None: _register_generic(self, module_name, module) return # used as decorator def register_fn(fn): _register_generic(self, module_name, fn) return fn return register_fn
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 可以看到在類Registry()中有定義返回函數register_fn(fn)的函數register,那么這樣就可以被當作裝飾器使用(關於裝飾器的語法可以參考這個:裝飾器)。那么當
@registry.BACKBONES.register("R-50-FPN-RETINANET")
這句話“裝飾”了build_resnet_fpn_p3p7_backbone
這個函數的時候,就完成了一個鍵值對的寫入:registry.BACKBONES[“R-50-FPN-RETINANET”]=build_resnet_fpn_p3p7_backbone,所以build_backbone(cfg)
返回的就是build_resnet_fpn_p3p7_backbone(cfg)
,那我們就來看一下這個函數是怎么構造backbone的:
@registry.BACKBONES.register("R-50-FPN-RETINANET") @registry.BACKBONES.register("R-101-FPN-RETINANET") def build_resnet_fpn_p3p7_backbone(cfg): body = resnet.ResNet(cfg) # 獲取 fpn 所需的channels參數 in_channels_stage2 = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS # 256 out_channels = cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS # 256 in_channels_p6p7 = in_channels_stage2 * 8 if cfg.MODEL.RETINANET.USE_C5 \ else out_channels fpn = fpn_module.FPN( in_channels_list=[ 0, in_channels_stage2 * 2, in_channels_stage2 * 4, in_channels_stage2 * 8, ], out_channels=out_channels, conv_block=conv_with_kaiming_uniform( # 這個conv如果stride=1的話就不變size,返回的是一個函數 cfg.MODEL.FPN.USE_GN, cfg.MODEL.FPN.USE_RELU ), top_blocks=fpn_module.LastLevelP6P7(in_channels_p6p7, out_channels), ) # 通過有序字典將body和fpn送入nn.Sequential構造模型 model = nn.Sequential(OrderedDict([("body", body), ("fpn", fpn)])) # 寫成一個,body的輸出作為fpn的輸入 # 這個是為了之后有用,再賦一次值 model.out_channels = out_channels return model
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- build_resnet_fpn_p3p7_backbone顧名思義,就是構造上述論文圖中的backbone部分,並且FPN部分從P3到P7。這里分別由resnet.ResNet產生body,由fpn_module.FPN產生fpn部分。下面依次來看這兩部分:
resnet.py
下面就是來構造一個resnet了,這在torchvision里面都有實現,這里也比較類似,如果之前看過torchvision里面的resnet代碼,這里就會非常好理解。先來看一些配置內容和較容易理解的部分:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. """ Variant of the resnet module that takes cfg as an argument. Example usage. Strings may be specified in the config file. model = ResNet( "StemWithFixedBatchNorm", "BottleneckWithFixedBatchNorm", "ResNet50StagesTo4", ) OR: model = ResNet( "StemWithGN", "BottleneckWithGN", "ResNet50StagesTo4", ) Custom implementations may be written in user code and hooked in via the `register_*` functions. """ # 上面是一個使用說明,下面先導入一些必要包 from collections import namedtuple import torch import torch.nn.functional as F from torch import nn from fcos_core.layers import FrozenBatchNorm2d from fcos_core.layers import Conv2d from fcos_core.layers import DFConv2d from fcos_core.modeling.make_layers import group_norm from fcos_core.utils.registry import Registry # ResNet stage specification 通過一個命名元組來設定resnet各階段的參數 StageSpec = namedtuple( "StageSpec", [ "index", # Index of the stage, eg 1, 2, ..,. 5 "block_count", # Number of residual blocks in the stage "return_features", # True => return the last feature map from this stage ], ) # ----------------------------------------------------------------------------- # Standard ResNet models # ----------------------------------------------------------------------------- # 下面這些元組會通過_STAGE_SPECS[cfg.MODEL.BACKBONE.CONV_BODY]來選定,我只放了resnet50的 # ResNet-50 (including all stages) ResNet50StagesTo5 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, False), (4, 3, True)) ) # ResNet-50 up to stage 4 (excludes stage 5)只使用到第四階段輸出的特征圖 ResNet50StagesTo4 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, True)) ) # ResNet-50-FPN (including all stages)由於fpn需要用到每一個階段輸出的特征圖, 故return_features參數均為True ResNet50FPNStagesTo5 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 6, True), (4, 3, True)) ) # 這個指定resnet的Bottleneck結構用FixedBatchNorm還是GroupNorm _TRANSFORMATION_MODULES = Registry({ "BottleneckWithFixedBatchNorm": BottleneckWithFixedBatchNorm, "BottleneckWithGN": BottleneckWithGN, }) # 這個指定resnet的Stem結構用FixedBatchNorm還是GroupNorm _STEM_MODULES = Registry({ "StemWithFixedBatchNorm": StemWithFixedBatchNorm, "StemWithGN": StemWithGN, }) # 這個指定具體構建resnet的哪個深度的模型,並且到第幾個stage _STAGE_SPECS = Registry({ "R-50-C4": ResNet50StagesTo4, "R-50-C5": ResNet50StagesTo5, "R-101-C4": ResNet101StagesTo4, "R-101-C5": ResNet101StagesTo5, "R-50-FPN": ResNet50FPNStagesTo5, "R-50-FPN-RETINANET": ResNet50FPNStagesTo5, "R-101-FPN": ResNet101FPNStagesTo5, "R-101-FPN-RETINANET": ResNet101FPNStagesTo5