DLA全稱是Deep Layer Aggregation, 於2018年發表於CVPR。被CenterNet, FairMOT等框架所采用,其效果很不錯,准確率和模型復雜度平衡的也比較好。
CenterNet中使用的DLASeg是在DLA-34的基礎上添加了Deformable Convolution后的分割網絡。
1. 簡介
Aggretation聚合是目前設計網絡結構的常用的一種技術。如何將不同深度,將不同stage、block之間的信息進行融合是本文探索的目標。
目前常見的聚合方式有skip connection, 如ResNet,這種融合方式僅限於塊內部,並且融合方式僅限於簡單的疊加。
本文提出了DLA的結構,能夠迭代式地將網絡結構的特征信息融合起來,讓模型有更高的精度和更少的參數。
上圖展示了DLA的設計思路,Dense Connections來自DenseNet,可以聚合語義信息。Feature Pyramids空間特征金字塔可以聚合空間信息。DLA則是將兩者更好地結合起來從而可以更好的獲取what和where的信息。仔細看一下DLA的其中一個模塊,如下圖所示:
研讀過代碼以后,可以看出這個花里胡哨的結構其實是按照樹的結構進行組織的,紅框框住的就是兩個樹,樹之間又采用了類似ResNet的殘差鏈接結構。
2. 核心
先來重新梳理一下上邊提到的語義信息和空間信息,文章給出了詳細解釋:
- 語義融合:在通道方向進行的聚合,能夠提高模型推斷“是什么”的能力(what)
- 空間融合:在分辨率和尺度方向的融合,能夠提高模型推斷“在哪里”的能力(where)
Deep Layer Aggregation核心模塊有兩個IDA(Iterative Deep Aggregation)和HDA(Hierarchical Deep Aggregation),如上圖所示。
-
紅色框代表的是用樹結構鏈接的層次結構,能夠更好地傳播特征和梯度。
-
黃色鏈接代表的是IDA,負責鏈接相鄰兩個stage的特征讓深層和淺層的表達能更好地融合。
-
藍色連線代表進行了下采樣,網絡一開始也和ResNet一樣進行了快速下采樣。
論文中也給了公式推導,感興趣的可以去理解一下。本文還是將重點放在代碼實現上。
3. 實現
這部分代碼復制自CenterNet官方實現,https://github.com/pprp/SimpleCVReproduction/blob/master/CenterNet/nets/dla34.py
3.1 基礎模塊
首先是三個模塊,BasicBlock和Bottleneck和ResNet中的一致,BottleneckX實際上是ResNeXt中的基礎模塊,也可以作為DLA中的基礎模塊。DLA34中調用的依然是BasicBlock。
class BasicBlock(nn.Module):
def __init__(self, inplanes, planes, stride=1, dilation=1):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3,
stride=stride, padding=dilation,
bias=False, dilation=dilation)
self.bn1 = BatchNorm(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
stride=1, padding=dilation,
bias=False, dilation=dilation)
self.bn2 = BatchNorm(planes)
self.stride = stride
def forward(self, x, residual=None):
if residual is None:
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += residual
out = self.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 2
def __init__(self, inplanes, planes, stride=1, dilation=1):
super(Bottleneck, self).__init__()
expansion = Bottleneck.expansion
bottle_planes = planes // expansion
self.conv1 = nn.Conv2d(inplanes, bottle_planes,
kernel_size=1, bias=False)
self.bn1 = BatchNorm(bottle_planes)
self.conv2 = nn.Conv2d(bottle_planes, bottle_planes, kernel_size=3,
stride=stride, padding=dilation,
bias=False, dilation=dilation)
self.bn2 = BatchNorm(bottle_planes)
self.conv3 = nn.Conv2d(bottle_planes, planes,
kernel_size=1, bias=False)
self.bn3 = BatchNorm(planes)
self.relu = nn.ReLU(inplace=True)
self.stride = stride
def forward(self, x, residual=None):
if residual is None:
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += residual
out = self.relu(out)
return out
class BottleneckX(nn.Module):
expansion = 2
cardinality = 32
def __init__(self, inplanes, planes, stride=1, dilation=1):
super(BottleneckX, self).__init__()
cardinality = BottleneckX.cardinality
# dim = int(math.floor(planes * (BottleneckV5.expansion / 64.0)))
# bottle_planes = dim * cardinality
bottle_planes = planes * cardinality // 32
self.conv1 = nn.Conv2d(inplanes, bottle_planes,
kernel_size=1, bias=False)
self.bn1 = BatchNorm(bottle_planes)
self.conv2 = nn.Conv2d(bottle_planes, bottle_planes, kernel_size=3,
stride=stride, padding=dilation, bias=False,
dilation=dilation, groups=cardinality)
self.bn2 = BatchNorm(bottle_planes)
self.conv3 = nn.Conv2d(bottle_planes, planes,
kernel_size=1, bias=False)
self.bn3 = BatchNorm(planes)
self.relu = nn.ReLU(inplace=True)
self.stride = stride
def forward(self, x, residual=None):
if residual is None:
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += residual
out = self.relu(out)
return out
3.2 Root類
然后就是Root類,對應下圖中的綠色模塊
所有的Aggregation Node都是通過調用這個模塊完成的,這個綠色結點也是其連接兩個樹的根,所以形象地稱之為Root。下面是代碼實現,forward函數中接受的是多個對象,用來聚合多個層的信息。
class Root(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, residual):
super(Root, self).__init__()
self.conv = nn.Conv2d(
in_channels, out_channels, 1,
stride=1, bias=False, padding=(kernel_size - 1) // 2)
self.bn = BatchNorm(out_channels)
self.relu = nn.ReLU(inplace=True)
self.residual = residual
def forward(self, *x):
# 輸入是多個層輸出結果
children = x
x = self.conv(torch.cat(x, 1))
x = self.bn(x)
if self.residual:
x += children[0]
x = self.relu(x)
return x
3.3 Tree類
Tree類對應圖中的HDA模塊,是最核心最復雜的地方,建議手動畫一下。其核心就是遞歸調用的Tree類的構建,以下是代碼。
class Tree(nn.Module):
'''
self.level5 = Tree(levels[5], block, channels[4], channels[5], 2,
level_root=True, root_residual=residual_root)
'''
def __init__(self, levels, block, in_channels, out_channels, stride=1,
level_root=False, root_dim=0, root_kernel_size=1,
dilation=1, root_residual=False):
super(Tree, self).__init__()
if root_dim == 0:
root_dim = 2 * out_channels
if level_root:
root_dim += in_channels
if levels == 1:
self.tree1 = block(in_channels, out_channels, stride,
dilation=dilation)
self.tree2 = block(out_channels, out_channels, 1,
dilation=dilation)
else:
self.tree1 = Tree(levels - 1, block, in_channels, out_channels,
stride, root_dim=0,
root_kernel_size=root_kernel_size,
dilation=dilation, root_residual=root_residual)
self.tree2 = Tree(levels - 1, block, out_channels, out_channels,
root_dim=root_dim + out_channels,
root_kernel_size=root_kernel_size,
dilation=dilation, root_residual=root_residual)
if levels == 1:
self.root = Root(root_dim, out_channels, root_kernel_size,
root_residual)
self.level_root = level_root
self.root_dim = root_dim
self.downsample = None
self.project = None
self.levels = levels
if stride > 1:
self.downsample = nn.MaxPool2d(stride, stride=stride)
if in_channels != out_channels:
self.project = nn.Sequential(
nn.Conv2d(in_channels, out_channels,
kernel_size=1, stride=1, bias=False),
BatchNorm(out_channels)
)
def forward(self, x, residual=None, children=None):
children = [] if children is None else children
bottom = self.downsample(x) if self.downsample else x
# project就是映射,如果輸入輸出通道數不同則將輸入通道數映射到輸出通道數
residual = self.project(bottom) if self.project else bottom
if self.level_root:
children.append(bottom)
x1 = self.tree1(x, residual)
if self.levels == 1:
x2 = self.tree2(x1)
# root是出口
x = self.root(x2, x1, *children)
else:
children.append(x1)
x = self.tree2(x1, children=children)
return x
經過筆者研究,這里涉及了兩個比較重要的參數level和level root。
這個類有兩個重要的成員變量tree1和tree2,是通過遞歸的方式迭代生成的,迭代層數通過level進行控制的,舉兩個例子,第一個是level為1,並且level root=True的情況,對照代碼和下圖可以理解得到:
也就是對應的是:
代碼中的children參數是一個list,保存的是所有傳給Root的成員,這些成員將作為其中的葉子結點。
第二個例子是level=2, level root=True的情況,如下圖所示:
這部分代碼對應的是:
粉色箭頭是children對象,都交給Root進行聚合操作。
3.4 DLA
Tree是DLA最重要的模塊,Tree搞定之后,DLA就按順序拼裝即可。
class DLA(nn.Module):
'''
DLA([1, 1, 1, 2, 2, 1],
[16, 32, 64, 128, 256, 512],
block=BasicBlock, **kwargs)
'''
def __init__(self, levels, channels, num_classes=1000,
block=BasicBlock, residual_root=False, return_levels=False,
pool_size=7, linear_root=False):
super(DLA, self).__init__()
self.channels = channels
self.return_levels = return_levels
self.num_classes = num_classes
self.base_layer = nn.Sequential(
nn.Conv2d(3, channels[0], kernel_size=7, stride=1,
padding=3, bias=False),
BatchNorm(channels[0]),
nn.ReLU(inplace=True))
# 在最初前兩層僅僅使用卷積層
self.level0 = self._make_conv_level(
channels[0], channels[0], levels[0])
self.level1 = self._make_conv_level(
channels[0], channels[1], levels[1], stride=2)
'''
if level_root:
root_dim += in_channels
'''
self.level2 = Tree(levels[2], block, channels[1], channels[2], 2,
level_root=False, root_residual=residual_root)
self.level3 = Tree(levels[3], block, channels[2], channels[3], 2,
level_root=True, root_residual=residual_root)
self.level4 = Tree(levels[4], block, channels[3], channels[4], 2,
level_root=True, root_residual=residual_root)
self.level5 = Tree(levels[5], block, channels[4], channels[5], 2,
level_root=True, root_residual=residual_root)
self.avgpool = nn.AvgPool2d(pool_size)
self.fc = nn.Conv2d(channels[-1], num_classes, kernel_size=1,
stride=1, padding=0, bias=True)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, BatchNorm):
m.weight.data.fill_(1)
m.bias.data.zero_()
def forward(self, x):
y = []
x = self.base_layer(x)
for i in range(6):
# 將幾個level串聯起來
x = getattr(self, 'level{}'.format(i))(x)
y.append(x)
if self.return_levels:
return y
else:
x = self.avgpool(x)
x = self.fc(x)
x = x.view(x.size(0), -1)
return x
4. DLASeg
DLASeg是在DLA的基礎上使用Deformable Convolution和Upsample層組合進行信息提取,提升了空間分辨率。
class DLASeg(nn.Module):
'''
DLASeg('dla{}'.format(num_layers), heads,
pretrained=True,
down_ratio=down_ratio,
final_kernel=1,
last_level=5,
head_conv=head_conv)
'''
def __init__(self, base_name, heads, pretrained, down_ratio, final_kernel,
last_level, head_conv, out_channel=0):
super(DLASeg, self).__init__()
assert down_ratio in [2, 4, 8, 16]
self.first_level = int(np.log2(down_ratio))
self.last_level = last_level
# globals() 函數會以字典類型返回當前位置的全部全局變量。
# 所以這個base就相當於原來的DLA34
self.base = globals()[base_name](pretrained=pretrained)
channels = self.base.channels
scales = [2 ** i for i in range(len(channels[self.first_level:]))]
# first_level = 2 if down_ratio=4
# channels = [16, 32, 64, 128, 256, 512] to [64, 128, 256, 512]
# scales = [1, 2, 4, 8]
self.dla_up = DLAUp(self.first_level, channels[self.first_level:], scales)
if out_channel == 0:
out_channel = channels[self.first_level]
# 進行上采樣
self.ida_up = IDAUp(out_channel, channels[self.first_level:self.last_level],
[2 ** i for i in range(self.last_level - self.first_level)])
self.heads = heads
for head in self.heads:
classes = self.heads[head]
if head_conv > 0:
fc = nn.Sequential(
nn.Conv2d(channels[self.first_level], head_conv,
kernel_size=3, padding=1, bias=True),
nn.ReLU(inplace=True),
nn.Conv2d(head_conv, classes,
kernel_size=final_kernel, stride=1,
padding=final_kernel // 2, bias=True))
if 'hm' in head:
fc[-1].bias.data.fill_(-2.19)
else:
fill_fc_weights(fc)
else:
fc = nn.Conv2d(channels[self.first_level], classes,
kernel_size=final_kernel, stride=1,
padding=final_kernel // 2, bias=True)
if 'hm' in head:
fc.bias.data.fill_(-2.19)
else:
fill_fc_weights(fc)
self.__setattr__(head, fc)
def forward(self, x):
x = self.base(x)
x = self.dla_up(x)
y = []
for i in range(self.last_level - self.first_level):
y.append(x[i].clone())
self.ida_up(y, 0, len(y))
z = {}
for head in self.heads:
z[head] = self.__getattr__(head)(y[-1])
return [z]
以上就是DLASeg的主要代碼,其中負責上采樣部分的是:
self.ida_up = IDAUp(out_channel, channels[self.first_level:self.last_level],
[2 ** i for i in range(self.last_level - self.first_level)])
這部分負責解碼,將空間分辨率提高。
class IDAUp(nn.Module):
'''
IDAUp(channels[j], in_channels[j:], scales[j:] // scales[j])
ida(layers, len(layers) -i - 2, len(layers))
'''
def __init__(self, o, channels, up_f):
super(IDAUp, self).__init__()
for i in range(1, len(channels)):
c = channels[i]
f = int(up_f[i])
proj = DeformConv(c, o)
node = DeformConv(o, o)
up = nn.ConvTranspose2d(o, o, f * 2, stride=f,
padding=f // 2, output_padding=0,
groups=o, bias=False)
fill_up_weights(up)
setattr(self, 'proj_' + str(i), proj)
setattr(self, 'up_' + str(i), up)
setattr(self, 'node_' + str(i), node)
def forward(self, layers, startp, endp):
for i in range(startp + 1, endp):
upsample = getattr(self, 'up_' + str(i - startp))
project = getattr(self, 'proj_' + str(i - startp))
layers[i] = upsample(project(layers[i]))
node = getattr(self, 'node_' + str(i - startp))
layers[i] = node(layers[i] + layers[i - 1])
其核心是DLAUP和IDAUP, 這兩個類中都使用了兩個Deformable Convolution可變形卷積,然后使用ConvTranspose2d進行上采樣,具體網絡結構如下圖所示。
5. Reference
https://arxiv.org/abs/1707.06484
https://github.com/pprp/SimpleCVReproduction/blob/master/CenterNet/nets/dla34.py