目錄
Toward fast and accurate human pose estimation via soft-gated skip connections
一. 論文簡介
設計小的block和feature map的融合方式,提人體姿態估計高計算效率和精度
主要做的貢獻如下(可能之前有人已提出):
- block部分使用soft-gate
- feature融合對比,選擇最佳方式
二. 模塊詳解
2.1 Soft-Gate模塊
- 簡單的說明就是下圖展示所示,給每個channel加上了attention機制,這種做法其實SE已經完成了,就是中間的步驟有點小區別而已

- 以下結合SE的權重,給出參考代碼,未嘗試運行,整體思路是這樣
#FIXME This code is demo for paper, maybe cant run directly, please modify some bug when you use.
class ConvBNReLU(nn.Sequential):
'''
#FIXME Only for 3*3 and 1*1 convolution by dilation equal 1 or 2
'''
def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, dilation=1, norm_layer=None):
padding = (kernel_size - 1) // 2
if dilation==2 and kernel_size==3 and stride==1:
padding=2
if norm_layer is None:
norm_layer = nn.BatchNorm2d
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding=padding, groups=groups, dilation=dilation, bias=False),
norm_layer(out_planes),
nn.ReLU6(inplace=True)
)
class SoftGateBlock(nn.Module):
def __init__(self, inp, gate=SqueezeExcite):
super(SoftGateBlock, self).__init__()
assert inp%4 == 0
self.layers = self.build_layer(inp, gate)
def forward(self, x):
alpha = self.layers[0](x)
branch_y1 = self.layers[1](x)
branch_y2 = self.layers[1](branch_y1)
branch_y3 = self.layers[1](branch_y2)
branch = torch.cat([branch_y1,branch_y2,branch_y3],dim=1)
y = alpha + branch #TODO please add dimension (torch.unsqueeze) to align dim if error
return y
def build_layer(self, chs, gate):
layers = []
layers.append(gate(chs))
for i in range(3):
layers.append(ConvBNReLU(chs, chs/2))
chs = chs/2
return nn.ModuleList(layers)
# SE module that attetion mode
class SqueezeExcite(nn.Module):
def __init__(self, in_chs, se_ratio=0.25, reduced_base_chs=None,
act_layer=nn.ReLU, gate_fn=hard_sigmoid, divisor=4, **_):
super(SqueezeExcite, self).__init__()
self.gate_fn = gate_fn
reduced_chs = _make_divisible((reduced_base_chs or in_chs) * se_ratio, divisor)
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True)
self.act1 = act_layer(inplace=True)
self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True)
def forward(self, x):
x_se = self.avg_pool(x)
x_se = self.conv_reduce(x_se)
x_se = self.act1(x_se)
x_se = self.conv_expand(x_se)
x = x * self.gate_fn(x_se)
return x
論文給的細節不完整,比如:
- 如何得到獲得這個權重 \(\alpha\) ?方法有很多,讓復現的人一個一個嘗試?
- 這里demo以SE模塊為例子
- 這里的block只給出了channel不變情況,那下采樣和上采樣的操作呢?
- 要么使用resnet原來block,要么自己設計。建議參考論文自己設計,因為目的是提升精度,使用前者沒意義
2.2 Feature融合
- 這部分很簡單,就是對比幾個連接操作,最后選擇下圖 \((b)\) 的內容,至於為什么?文中給了對比效果表格。
