FCOS官方代碼詳解(一):Architecture(backbone)


 

關於FCOS的論文講解網上也有挺多了,但是從代碼角度理解的不多,最近看了一下,想把自己的理解寫出來,記錄一下,可以忘記后又來看一眼,代碼能理解,論文肯定能理解。個人還是比較喜歡這種one-stage的anchor-free的方法,簡單,好理解,🤭。不要着急,本片有點長,剛開始接觸一天能看完就不錯了~~
論文理解:FCOS: Fully Convolutional One-Stage Object Detection 【國內鏡像
官方源碼:https://github.com/tianzhi0549/FCOS【基於maskrcnn-benchmark
放一篇博客吧:FCOS: 最新的one-stage逐像素目標檢測算法

從論文里面可以看出,整個architecture就是由三部分構成的:Backbone、FPN、Head(里面又分成Classification,Center-ness,Regression三個分支)
The network architecture of FCOS
現在就來看源碼里面關於這三部分是怎么構造的吧:

tools/train_net.py

我們按着程序運行流向來逐個理解,最終形成整個pipeline的印象。這是官方repo中的README.md中訓練部分的命令:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --master_port=$((RANDOM + 10000)) \
    tools/train_net.py \
    --config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml \
    DATALOADER.NUM_WORKERS 2 \
    OUTPUT_DIR training_dir/fcos_imprv_R_50_FPN_1x

 

  • 咋一看怎么這個都看不懂,瞬間信心失掉一半,還是硬着頭皮看吧,前三個參數都是有關分布式訓練的,因為我只有單卡,所以我先沒有管這三個參數,如果有多卡的,可以去看這一篇相似的高質量教程:Pytorch中多GPU訓練指北。關於python -m的用法,可看這一篇:python -m是拿來干啥用的?
  • 這里還是啰一嘴:python -m使得torch.distributed.launch.py能像模塊一樣運行,因為分布式用的DistributedDataParallel,torch.distributed.launch為我們觸發了n個train_net.py進程,nproc_per_node和master_port都是torch.distributed.launch.py的命令行參數。該文件位於miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py
  • 訓練入口就是tools/train_net.py,之后在configs/fcos/下有很多.yaml后綴的配置文件,就像json,xml等文件,只不過需要利用yacs這個包進行讀入,后面的DATALOADER.NUM_WORKERS和OUTPUT_DIR就是配置文件里面的某些項,這個等一下還要講,不過建議先去看一下rgb大神寫的yacs的說明:項目地址,就先看一下README就行,因為配置文件不是我們的重點。

main()

按着調用關系,最先來到的就是main()方法:

def main(): # 這個就是解析命令行參數,如上面的--config-file configs/fcos/fcos_imprv_R_50_FPN_1x.yaml parser = argparse.ArgumentParser(description="PyTorch Object Detection Training") parser.add_argument( "--config-file", default="", metavar="FILE", help="path to config file", type=str, ) # 這個參數是torch.distributed.launch傳遞過來的,我們設置位置參數來接受 # local_rank代表當前程序進程使用的GPU標號 parser.add_argument( "--local_rank", type=int, default=0, help="local_rank is used by torch.distributed.launch to leverage multiple GPUs", ) parser.add_argument( "--skip-test", dest="skip_test", help="Do not test the final model", action="store_true", ) parser.add_argument( "opts", help="Modify config options using the command-line", default=None, nargs=argparse.REMAINDER, ) args = parser.parse_args() # 判斷機器上GPU的數量,大於1時自動使用分布式訓練 # WORLD_SIZE 由torch.distributed.launch.py產生 # 具體數值為 nproc_per_node*node(node就是主機數) num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1 args.distributed = num_gpus > 1 if args.distributed: # 因為我沒有多卡,我就沒管這個 torch.cuda.set_device(args.local_rank) # 這些都是分布式訓練需要的,local_rank用於這里 torch.distributed.init_process_group( backend="nccl", init_method="env://" ) synchronize() # 參數默認是在fcos_core/config/defaults.py中,其余由config_file,opts覆蓋 cfg.merge_from_file(args.config_file) # 從yaml文件中讀取參數 cfg.merge_from_list(args.opts) # 也可以從命令行參數重寫 cfg.freeze() # 凍住參數,為了防止之后不小心被更改,cfg被傳入train() # 可以在這里打印cfg看看,我以fcos_R_50_FPN_1x.yaml為例 output_dir = cfg.OUTPUT_DIR # 創建輸出文件夾,存放一些日志信息 if output_dir: mkdir(output_dir) # 寫入日志文件,包括GPU數量,系統環境,配置文件參數等 logger = setup_logger("fcos_core", output_dir, get_rank()) logger.info("Using {} GPUs".format(num_gpus)) logger.info(args) logger.info("Collecting env info (might take some time)") logger.info("\n" + collect_env_info()) logger.info("Loaded configuration file {}".format(args.config_file)) with open(args.config_file, "r") as cf: config_str = "\n" + cf.read() logger.info(config_str) logger.info("Running with config:\n{}".format(cfg)) # 這句話是下一個入口,關注train()方法,里面第一步就是構建模型 model = train(cfg, args.local_rank, args.distributed) # cfg, 0, 0 if not args.skip_test: # 如果不跳過test,那就ran一下它 run_test(cfg, model, args.distributed) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73

我們可以看到config-filelocal_rank前面都有--表明他們是可選參數(optional arguments),但是opts沒有,意味這他是位置參數(positional arguments),而nargs=argparse.REMAINDER表明:所有剩余的命令行參數都被收集到一個列表中(這通常用於命令行工具分發命令到其它命令行工具),所以args.opts就傳入到了cfg.merge_from_list()

train()

下面要進入的就是train()方法,在main中被這樣調用:

model = train(cfg, args.local_rank, args.distributed) # cfg, 0, 0 返回一個model
model = build_detection_model(cfg) # 本篇只寫到模型的構建,所以只用看到這一句

fcos_core/modeling/detector/detectors.py

train中的build_detection_model(cfg) 指向這個文件:

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. from .generalized_rcnn import GeneralizedRCNN _DETECTION_META_ARCHITECTURES = {"GeneralizedRCNN": GeneralizedRCNN} def build_detection_model(cfg): meta_arch = _DETECTION_META_ARCHITECTURES[cfg.MODEL.META_ARCHITECTURE] return meta_arch(cfg) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

從打印出來的cfg可以看出:

MODEL: ... META_ARCHITECTURE: GeneralizedRCNN 
  • 1
  • 2
  • 3

build_detection_model返回的就是GeneralizedRCNN(cfg)

fcos_core/modeling/detector/generalized_rcnn.py

generalized_rcnn.py這個模塊就是Implements the Generalized R-CNN framework,當然作者沒有改變太多maskrcnn-benchmark的代碼,一開始看還很好奇,怎么FCOS里面還有roi的,后來發現其實是沒用的。當然這里的rpn也不是R-CNN framework中的rpn,而是FCOS的Head。所以看下build_backbone和build_rpn

class GeneralizedRCNN(nn.Module): """ Main class for Generalized R-CNN. Currently supports boxes and masks. It consists of three main parts: - backbone - rpn - heads: takes the features + the proposals from the RPN and computes detections / masks from it. """ def __init__(self, cfg): super(GeneralizedRCNN, self).__init__() self.backbone = build_backbone(cfg) #返回的是一個nn.Sequential的model,其中FPN出來的就是那多層特征 self.rpn = build_rpn(cfg, self.backbone.out_channels) # 這里就是FCOS頭部 self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels) # 這里roi_heads是空列表 def forward(self, images, targets=None): pass # 這里暫時不用,所以我pass掉了 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

build_backbone()

build_backbone是fcos_core/modeling/backbone/backbone.py下的一個函數,我這里是以:fcos_R_50_FPN_1x.yaml為例(旁邊標的參數都是這個里面的,下不贅述),因為其中的CONV_BODY是 R-50-FPN-RETINANET,所以我只關注 R-50-FPN-RETINANET的注冊(register),當然這個會了,其他的如 R-50-C4的注冊都沒有問題:

from collections import OrderedDict from torch import nn from fcos_core.modeling import registry from fcos_core.modeling.make_layers import conv_with_kaiming_uniform from . import fpn as fpn_module from . import resnet from . import mobilenet @registry.BACKBONES.register("R-50-FPN-RETINANET") @registry.BACKBONES.register("R-101-FPN-RETINANET") def build_resnet_fpn_p3p7_backbone(cfg): # 為節省篇幅,方便閱讀,這里就先不放這一函數內容,下面會再說 return model def build_backbone(cfg): # 如果CONV_BOD不在registry.BACKBONES中就拋出異常 assert cfg.MODEL.BACKBONE.CONV_BODY in registry.BACKBONES, \ "cfg.MODEL.BACKBONE.CONV_BODY: {} are not registered in registry".format( cfg.MODEL.BACKBONE.CONV_BODY ) return registry.BACKBONES[cfg.MODEL.BACKBONE.CONV_BODY](cfg) # usage of decorator # registry.BACKBONES[cfg.MODEL.BACKBONE.CONV_BODY] ==> 指代build_resnet_fpn_p3p7_backbone() # 所以后面加一個參數:cfg 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  1. build_backbone下面首先就是個assert語句,CONV_BODY-->R-50-FPN-RETINANET是一個字符串,registry.BACKBONES是fcos_core/modeling/registry.py下的Registry類實例化的一個對象,該類定義在fcos_core/utils/registry.py中,繼承了dict類,所以registry.BACKBONES也有字典的用法,上面的【11行】用到了裝飾器(decorator)的語法:@registry.BACKBONES.register("R-50-FPN-RETINANET"),這個往下看:
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. def _register_generic(module_dict, module_name, module): assert module_name not in module_dict module_dict[module_name] = module class Registry(dict): ''' A helper class for managing registering modules, it extends a dictionary and provides a register functions. Eg. creeting a registry: some_registry = Registry({"default": default_module}) There're two ways of registering new modules: 1): normal way is just calling register function: def foo(): ... some_registry.register("foo_module", foo) 2): used as decorator when declaring the module: @some_registry.register("foo_module") @some_registry.register("foo_modeul_nickname") def foo(): ... Access of module is just like using a dictionary, eg: f = some_registry["foo_modeul"] ''' def __init__(self, *args, **kwargs): super(Registry, self).__init__(*args, **kwargs) def register(self, module_name, module=None): # used as function call if module is not None: _register_generic(self, module_name, module) return # used as decorator def register_fn(fn): _register_generic(self, module_name, fn) return fn return register_fn 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  1. 可以看到在類Registry()中有定義返回函數register_fn(fn)的函數register,那么這樣就可以被當作裝飾器使用(關於裝飾器的語法可以參考這個:裝飾器)。那么當@registry.BACKBONES.register("R-50-FPN-RETINANET")這句話“裝飾”了build_resnet_fpn_p3p7_backbone這個函數的時候,就完成了一個鍵值對的寫入:registry.BACKBONES[“R-50-FPN-RETINANET”]=build_resnet_fpn_p3p7_backbone,所以build_backbone(cfg)返回的就是build_resnet_fpn_p3p7_backbone(cfg),那我們就來看一下這個函數是怎么構造backbone的:
@registry.BACKBONES.register("R-50-FPN-RETINANET") @registry.BACKBONES.register("R-101-FPN-RETINANET") def build_resnet_fpn_p3p7_backbone(cfg): body = resnet.ResNet(cfg) # 獲取 fpn 所需的channels參數 in_channels_stage2 = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS # 256 out_channels = cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS # 256 in_channels_p6p7 = in_channels_stage2 * 8 if cfg.MODEL.RETINANET.USE_C5 \ else out_channels fpn = fpn_module.FPN( in_channels_list=[ 0, in_channels_stage2 * 2, in_channels_stage2 * 4, in_channels_stage2 * 8, ], out_channels=out_channels, conv_block=conv_with_kaiming_uniform( # 這個conv如果stride=1的話就不變size,返回的是一個函數 cfg.MODEL.FPN.USE_GN, cfg.MODEL.FPN.USE_RELU ), top_blocks=fpn_module.LastLevelP6P7(in_channels_p6p7, out_channels), ) # 通過有序字典將body和fpn送入nn.Sequential構造模型 model = nn.Sequential(OrderedDict([("body", body), ("fpn", fpn)])) # 寫成一個,body的輸出作為fpn的輸入 # 這個是為了之后有用,再賦一次值 model.out_channels = out_channels return model 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  1. build_resnet_fpn_p3p7_backbone顧名思義,就是構造上述論文圖中的backbone部分,並且FPN部分從P3到P7。這里分別由resnet.ResNet產生body,由fpn_module.FPN產生fpn部分。下面依次來看這兩部分:

resnet.py

下面就是來構造一個resnet了,這在torchvision里面都有實現,這里也比較類似,如果之前看過torchvision里面的resnet代碼,這里就會非常好理解。先來看一些配置內容和較容易理解的部分:

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. """ Variant of the resnet module that takes cfg as an argument. Example usage. Strings may be specified in the config file. model = ResNet( "StemWithFixedBatchNorm", "BottleneckWithFixedBatchNorm", "ResNet50StagesTo4", ) OR: model = ResNet( "StemWithGN", "BottleneckWithGN", "ResNet50StagesTo4", ) Custom implementations may be written in user code and hooked in via the `register_*` functions. """ # 上面是一個使用說明,下面先導入一些必要包 from collections import namedtuple import torch import torch.nn.functional as F from torch import nn from fcos_core.layers import FrozenBatchNorm2d from fcos_core.layers import Conv2d from fcos_core.layers import DFConv2d from fcos_core.modeling.make_layers import group_norm from fcos_core.utils.registry import Registry # ResNet stage specification 通過一個命名元組來設定resnet各階段的參數 StageSpec = namedtuple( "StageSpec", [ "index", # Index of the stage, eg 1, 2, ..,. 5 "block_count", # Number of residual blocks in the stage "return_features", # True => return the last feature map from this stage ], ) # ----------------------------------------------------------------------------- # Standard ResNet models # ----------------------------------------------------------------------------- # 下面這些元組會通過_STAGE_SPECS[cfg.MODEL.BACKBONE.CONV_BODY]來選定,我只放了resnet50的 # ResNet-50 (including all stages) ResNet50StagesTo5 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, False), (4, 3, True)) ) # ResNet-50 up to stage 4 (excludes stage 5)只使用到第四階段輸出的特征圖 ResNet50StagesTo4 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, True)) ) # ResNet-50-FPN (including all stages)由於fpn需要用到每一個階段輸出的特征圖, 故return_features參數均為True ResNet50FPNStagesTo5 = tuple( StageSpec(index=i, block_count=c, return_features=r) for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 6, True), (4, 3, True)) ) # 這個指定resnet的Bottleneck結構用FixedBatchNorm還是GroupNorm _TRANSFORMATION_MODULES = Registry({ "BottleneckWithFixedBatchNorm": BottleneckWithFixedBatchNorm, "BottleneckWithGN": BottleneckWithGN, }) # 這個指定resnet的Stem結構用FixedBatchNorm還是GroupNorm _STEM_MODULES = Registry({ "StemWithFixedBatchNorm": StemWithFixedBatchNorm, "StemWithGN": StemWithGN, }) # 這個指定具體構建resnet的哪個深度的模型,並且到第幾個stage _STAGE_SPECS = Registry({ "R-50-C4": ResNet50StagesTo4, "R-50-C5": ResNet50StagesTo5, "R-101-C4": ResNet101StagesTo4, "R-101-C5": ResNet101StagesTo5, "R-50-FPN": ResNet50FPNStagesTo5, "R-50-FPN-RETINANET": ResNet50FPNStagesTo5, "R-101-FPN": ResNet101FPNStagesTo5, "R-101-FPN-RETINANET": ResNet101FPNStagesTo5, "R-152-FPN": ResNet152FPNStagesTo5, }) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84

想要了解一下命名元組(namedtuple)的使用語法的可以看一下這一篇,可以打印看一下ResNet50FPNStagesTo5是什么樣的:

(StageSpec(index=1, block_count=3, return_features=True),
StageSpec(index=2, block_count=4, return_features=True),
StageSpec(index=3, block_count=6, return_features=True),
StageSpec(index=4, block_count=3, return_features=True))
這個可以對照着下面的圖看就很容易理解,block_count就是每個stage里面的block數量,return_features表明該stage是否輸出特征

Resnet各模型架構圖
下面在來看整體resnet構造之前看一下基本單元的構造,包括stem(主干)和Bottleneck(瓶頸結構),通過這些堆疊,就構成了整個resnet:其中stem(主干)部分就是上圖中conv1的7×7卷積和con2_x中的3×3的max pool;而對於Bottleneck(瓶頸結構),可以看到對於resnet50,resnet101,resnet152都是一樣的結構,只是block在每個stage的數量不同,所以可以用循環很好地構造出來:

# 為了觀看方便我把同類內容放一起了 class BaseStem(nn.Module): def __init__(self, cfg, norm_func): super(BaseStem, self).__init__() out_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS #64->主干的輸出通道數 self.conv1 = Conv2d( 3, out_channels, kernel_size=7, stride=2, padding=3, bias=False ) self.bn1 = norm_func(out_channels) # 通過對應的norm_func歸一化層 for l in [self.conv1,]: # 凱明初始化 nn.init.kaiming_uniform_(l.weight, a=1) def forward(self, x): # 定義前向傳播過程 x = self.conv1(x) x = self.bn1(x) x = F.relu_(x) # 這里stem也包括了max pool,因為無參數,直接寫在forward里 x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1) return x # 下面的StemWithFixedBatchNorm和StemWithGN繼承了類BaseStem # 只不過初始化的時候是FrozenBatchNorm2d還是group_norm class StemWithFixedBatchNorm(BaseStem): def __init__(self, cfg): super(StemWithFixedBatchNorm, self).__init__( cfg, norm_func=FrozenBatchNorm2d ) class StemWithGN(BaseStem): def __init__(self, cfg): super(StemWithGN, self).__init__(cfg, norm_func=group_norm) class Bottleneck(nn.Module): def __init__( self, in_channels, # bottleneck的輸入channels bottleneck_channels, # bottleneck壓縮后的channels out_channels, # bottleneck的輸出channels num_groups, # bottleneck分組的num stride_in_1x1, # 在每個stage的開始的1x1conv中的stride stride, # 卷積步長 dilation, # 空洞卷積的間隔 norm_func, # 用哪一個歸一化函數 dcn_config # Deformable Convolutional Networks配置情況 ): super(Bottleneck, self).__init__() # downsample: 當 bottleneck 的輸入和輸出的 channels 不相等時, 則需要采用一定的策略 # 在原文中, 有 A, B, C三種策略, 本文采用的是 B 策略(也是原文推薦的) # 即只有在輸入輸出通道數不相等時才使用 projection shortcuts, # 也就是利用參數矩陣映射使得輸入輸出的 channels 相等 self.downsample = None # 當輸入輸出通道數不同時, 額外添加一個1×1的卷積層使得輸入通道數映射成輸出通道數 if in_channels != out_channels: down_stride = stride if dilation == 1 else 1 self.downsample = nn.Sequential( Conv2d( in_channels, out_channels, kernel_size=1, stride=down_stride, bias=False ), norm_func(out_channels), ) for modules in [self.downsample,]: for l in modules.modules(): if isinstance(l, Conv2d): nn.init.kaiming_uniform_(l.weight, a=1) if dilation > 1: stride = 1 # reset to be 1 # The original MSRA ResNet models have stride in the first 1x1 conv # The subsequent fb.torch.resnet and Caffe2 ResNe[X]t implementations have # stride in the 3x3 conv # 這里的意思就是本來論文里的stride=2的卷積用在stage3-5的第一個1x1conv上,現在用在 # 3x3conv里,但是這里因為是原來框架的,我打印出來還是在1x1conv上,系沒有刪除注釋 # 因為下面調用的時候都是stride_in_1x1=True stride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride) self.conv1 = Conv2d( in_channels, bottleneck_channels, kernel_size=1, stride=stride_1x1, bias=False, ) self.bn1 = norm_func(bottleneck_channels) # TODO: specify init for the above # dcn_config字典中有鍵"stage_with_dcn",則返回對應的值,否則為False with_dcn = dcn_config.get("stage_with_dcn", False) # 判斷bottleneck的第二層卷積層是否使用可變形卷積 if with_dcn: deformable_groups = dcn_config.get("deformable_groups", 1) with_modulated_dcn = dcn_config.get("with_modulated_dcn", False) self.conv2 = DFConv2d( bottleneck_channels, bottleneck_channels, with_modulated_dcn=with_modulated_dcn, kernel_size=3, stride=stride_3x3, groups=num_groups, dilation=dilation, deformable_groups=deformable_groups, bias=False ) else: self.conv2 = Conv2d( bottleneck_channels, bottleneck_channels, kernel_size=3, stride=stride_3x3, padding=dilation, bias=False, groups=num_groups, dilation=dilation ) nn.init.kaiming_uniform_(self.conv2.weight, a=1) self.bn2 = norm_func(bottleneck_channels) # 創建bottleneck的第3層卷積層 self.conv3 = Conv2d( bottleneck_channels, out_channels, kernel_size=1, bias=False ) self.bn3 = norm_func(out_channels) for l in [self.conv1, self.conv3,]: nn.init.kaiming_uniform_(l.weight, a=1) def forward(self, x): # 定義前向傳播過程 identity = x out = self.conv1(x) out = self.bn1(out) out = F.relu_(out) out = self.conv2(out) out = self.bn2(out) out = F.relu_(out) out0 = self.conv3(out) out = self.bn3(out0) if self.downsample is not None: identity = self.downsample(x) out += identity # 跳連結構,此時add起來 out = F.relu_(out) # 本地relu return out # 當Bottleneck類實現好的時候,BottleneckWithFixedBatchNorm和BottleneckWithGN # 就是簡單的繼承它就好了,然后初始化自己的參數,唯一的區別就是norm_func是FrozenBatchNorm2d # 還是group_norm class BottleneckWithFixedBatchNorm(Bottleneck): def __init__( self, in_channels, bottleneck_channels, out_channels, num_groups=1, stride_in_1x1=True, stride=1, dilation=1, dcn_config=None ): super(BottleneckWithFixedBatchNorm, self).__init__( in_channels=in_channels, bottleneck_channels=bottleneck_channels, out_channels=out_channels, num_groups=num_groups, stride_in_1x1=stride_in_1x1, stride=stride, dilation=dilation, norm_func=FrozenBatchNorm2d, dcn_config=dcn_config ) class BottleneckWithGN(Bottleneck): def __init__( self, in_channels, bottleneck_channels, out_channels, num_groups=1, stride_in_1x1=True, stride=1, dilation=1, dcn_config=None ): super(BottleneckWithGN, self).__init__( in_channels=in_channels, bottleneck_channels=bottleneck_channels, out_channels=out_channels, num_groups=num_groups, stride_in_1x1=stride_in_1x1, stride=stride, dilation=dilation, norm_func=group_norm, dcn_config=dcn_config ) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205

在看下面resnet主體調用上面這些基本單元時,我強烈建議可以打印出build_backbone(cfg)的結果,對照着看會更容易理解下面的代碼:

class ResNet(nn.Module): def __init__(self, cfg): super(ResNet, self).__init__() # If we want to use the cfg in forward(), then we should make a copy # of it and store it for later use: # self.cfg = cfg.clone() # Translate string names to implementations 根據cfg選取具體實現 stem_module = _STEM_MODULES[cfg.MODEL.RESNETS.STEM_FUNC] # eg: "StemWithFixedBatchNorm" stage_specs = _STAGE_SPECS[cfg.MODEL.BACKBONE.CONV_BODY] # eg: "R-50-FPN-RETINANET" transformation_module = _TRANSFORMATION_MODULES[cfg.MODEL.RESNETS.TRANS_FUNC] # Construct the stem module 這里是stem的實現, 也就是resnet的第一階段conv1 self.stem = stem_module(cfg) # Constuct the specified ResNet stages resnet conv2_x~conv5_x的實現 num_groups = cfg.MODEL.RESNETS.NUM_GROUPS # eg:1 1時為ResNet, >1 時為ResNeXt width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP # eg:64 in_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS # eg:64 stage2_bottleneck_channels = num_groups * width_per_group # eg:64 stage2_out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS # eg:256 self.stages = [] self.return_features = {} for stage_spec in stage_specs: name = "layer" + str(stage_spec.index) stage2_relative_factor = 2 ** (stage_spec.index - 1) # 每過一個stage,bottleneck_channels和out_channels 翻倍 bottleneck_channels = stage2_bottleneck_channels * stage2_relative_factor out_channels = stage2_out_channels * stage2_relative_factor stage_with_dcn = cfg.MODEL.RESNETS.STAGE_WITH_DCN[stage_spec.index - 1] # 循環調用_make_stage,依次實現conv2_x~conv5_x module = _make_stage( transformation_module, # BottleneckWithFixedBatchNorm in_channels, bottleneck_channels, out_channels, stage_spec.block_count, num_groups, cfg.MODEL.RESNETS.STRIDE_IN_1X1, first_stride=int(stage_spec.index > 1) + 1, # 當處於stage3~5時, 使用stride=2來downsize dcn_config={ "stage_with_dcn": stage_with_dcn, "with_modulated_dcn": cfg.MODEL.RESNETS.WITH_MODULATED_DCN, "deformable_groups": cfg.MODEL.RESNETS.DEFORMABLE_GROUPS, } ) in_channels = out_channels self.add_module(name, module) self.stages.append(name) self.return_features[name] = stage_spec.return_features # Optionally freeze (requires_grad=False) parts of the backbone self._freeze_backbone(cfg.MODEL.BACKBONE.FREEZE_CONV_BODY_AT) def _freeze_backbone(self, freeze_at): # 根據給定的freeze_at參數凍結相應層的參數更新 if freeze_at < 0: return for stage_index in range(freeze_at): if stage_index == 0: m = self.stem # stage 0 is the stem else: m = getattr(self, "layer" + str(stage_index)) for p in m.parameters(): p.requires_grad = False def forward(self, x): outputs = [] x = self.stem(x) for stage_name in self.stages: x = getattr(self, stage_name)(x) # 將stage2~5中需要返回的某些層的特征圖以列表形式保存,作為FPN的輸入 if self.return_features[stage_name]: outputs.append(x) return outputs def _make_stage( transformation_module, in_channels, bottleneck_channels, out_channels, block_count, num_groups, stride_in_1x1, first_stride, dilation=1, dcn_config=None ): blocks = [] stride = first_stride # 循環調用類Bottleneck,每調用一次構造一個瓶頸結構 for _ in range(block_count): blocks.append( transformation_module( in_channels, bottleneck_channels, out_channels, num_groups, stride_in_1x1, stride, dilation=dilation, dcn_config=dcn_config ) ) stride = 1 # 注意就是第一次的stride=first_stride,之后都等於1 in_channels = out_channels return nn.Sequential(*blocks) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108

經過這么長看下來,終於構造好了resnet,不要忘了,接下來還有fpn和fcos_head沒有構造,繼續加油💪!!!!

fpn.py

build_resnet_fpn_p3p7_backbone的第二部分就是構造fpn,涉及到的代碼如下:

from . import fpn as fpn_module fpn = fpn_module.FPN( in_channels_list=[ 0, # 因為從C3起才有P3,所以stage2跳過,設置為0 in_channels_stage2 * 2, in_channels_stage2 * 4, in_channels_stage2 * 8, ], out_channels=out_channels, # cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS conv_block=conv_with_kaiming_uniform( # 這個conv如果stride=1的話就不變size,返回的是nn.Conv2d cfg.MODEL.FPN.USE_GN, cfg.MODEL.FPN.USE_RELU # eg: False, False ), top_blocks=fpn_module.LastLevelP6P7(in_channels_p6p7, out_channels), # eg: 256, 256 ) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

其中的fpn_module就是fcos_core/modeling/backbone/fpn.py,其中有FPN類,看看代碼:

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. import torch import torch.nn.functional as F from torch import nn class FPN(nn.Module): """ Module that adds FPN on top of a list of feature maps. 在特征圖列表上添加FPN The feature maps are currently supposed to be in increasing depth order, and must be consecutive 假設特征圖在列表中是按深度連續遞增排列 """ def __init__( self, in_channels_list, out_channels, conv_block, top_blocks=None ): """ Arguments: in_channels_list (list[int]): number of channels for each feature map that will be fed out_channels (int): number of channels of the FPN representation top_blocks (nn.Module or None): if provided, an extra operation will be performed on the output of the last (smallest resolution) FPN output, and the result will extend the result list """ super(FPN, self).__init__() self.inner_blocks = [] self.layer_blocks = [] for idx, in_channels in enumerate(in_channels_list, 1): # 從1開始計idx inner_block = "fpn_inner{}".format(idx) layer_block = "fpn_layer{}".format(idx) if in_channels == 0: continue inner_block_module = conv_block(in_channels, out_channels, 1) # 注意這里layer_block_module輸出通道數都是out_channels,比如256 # 也就是說fpn每一層級的特征圖輸出通道數是一樣的 layer_block_module = conv_block(out_channels, out_channels, 3, 1) self.add_module(inner_block, inner_block_module) self.add_module(layer_block, layer_block_module) self.inner_blocks.append(inner_block) self.layer_blocks.append(layer_block) self.top_blocks = top_blocks def forward(self, x): """ 具體每層流向可以對照着看我下面一幅圖 Arguments: x (list[Tensor]): feature maps for each feature level. C系列,i.e. [C3, C4, C5] 其實就是resnet(body)的輸出 Returns: results (tuple[Tensor]): feature maps after FPN layers. They are ordered from highest resolution first. P系列,i.e. [P3, P4, P5, P6, P7] """ last_inner = getattr(self, self.inner_blocks[-1])(x[-1]) results = [] results.append(getattr(self, self.layer_blocks[-1])(last_inner)) for feature, inner_block, layer_block in zip( x[:-1][::-1], self.inner_blocks[:-1][::-1], self.layer_blocks[:-1][::-1] ): if not inner_block: continue # inner_top_down = F.interpolate(last_inner, scale_factor=2, mode="nearest") inner_lateral = getattr(self, inner_block)(feature) inner_top_down = F.interpolate( last_inner, size=(int(inner_lateral.shape[-2]), int(inner_lateral.shape[-1])), mode='nearest' ) last_inner = inner_lateral + inner_top_down results.insert(0, getattr(self, layer_block)(last_inner)) if isinstance(self.top_blocks, LastLevelP6P7): last_results = self.top_blocks(x[-1], results[-1]) results.extend(last_results) elif isinstance(self.top_blocks, LastLevelMaxPool): last_results = self.top_blocks(results[-1]) results.extend(last_results) return tuple(results) class LastLevelMaxPool(nn.Module): def forward(self, x): return [F.max_pool2d(x, 1, 2, 0)] class LastLevelP6P7(nn.Module): """ This module is used in RetinaNet to generate extra layers, P6 and P7. """ def __init__(self, in_channels, out_channels): super(LastLevelP6P7, self).__init__() # 在C5或者P5的基礎上再來兩層卷積得到P6,P7 self.p6 = nn.Conv2d(in_channels, out_channels, 3, 2, 1) self.p7 = nn.Conv2d(out_channels, out_channels, 3, 2, 1) for module in [self.p6, self.p7]: nn.init.kaiming_uniform_(module.weight, a=1) nn.init.constant_(module.bias, 0) self.use_P5 = in_channels == out_channels def forward(self, c5, p5): x = p5 if self.use_P5 else c5 p6 = self.p6(x) p7 = self.p7(F.relu(p6)) return [p6, p7] 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107

打印出來看一下是這樣的:

(fpn): FPN( (fpn_inner2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_layer2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_inner3): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_layer3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_inner4): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_layer4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_blocks): LastLevelP6P7( (p6): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) ) ) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

里面有幾個關於python列表的用法:results.append,results.insert,results.extend可以看一下這篇,下圖是我照着畫的fpn的前向流程圖,可以對照forward部分幫助記憶:
fpn前向構造
其實我這里有個小疑惑,其實我上面寫的fpn的forward中的x是[C3, C4, C5],但是實際分析不應該是[C2, C3, C4, C5]嗎,這樣的話就和后面的數量對不起來了,這還得實際跑起來驗證一下
至此,fpn部分就以元組形式返回了(P3, P4, P5, P6, P7)的各層級特征,准備送入后面的tower和head

fcos_head

在類GeneralizedRCNN初始化的時候還有這么一句:self.rpn = build_rpn(cfg, self.backbone.out_channels),其實這里沒改過來,實際構造的是fcos_head,返回的是build_fcos(cfg, in_channels),具體代碼在fcos_core/modeling/rpn/fcos/fcos.py
太長了,先停一下,換下一篇繼續寫。。。。。。

下一篇地址

FCOS官方代碼詳解(二):Architecture(head)

References

MaskrcnnBenchmark 源碼解析-模型定義(modeling)之骨架網絡(backbone)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM