前言:【從零開始學習YOLOv3】系列越寫越多,本來安排的內容比較少,但是在閱讀代碼的過程中慢慢發掘了一些新的亮點,所以不斷加入到這個系列中。之前都在讀YOLOv3中的代碼,已經學習了cfg文件、模型構建等內容。本文在之前的基礎上,對模型的代碼進行修改,將之前Attention系列中的SE模塊和CBAM模塊集成到YOLOv3中。
1. 規定格式
正如[convolutional]
,[maxpool]
,[net]
,[route]
等層在cfg中的定義一樣,我們再添加全新的模塊的時候,要規定一下cfg的格式。做出以下規定:
在SE模塊(具體講解見: 【cv中的Attention機制】最簡單最易實現的SE模塊)中,有一個參數為reduction
,這個參數默認是16,所以在這個模塊中的詳細參數我們按照以下內容進行設置:
[se]
reduction=16
在CBAM模塊(具體講解見: 【CV中的Attention機制】ECCV 2018 Convolutional Block Attention Module)中,空間注意力機制和通道注意力機制中一共存在兩個參數:ratio
和kernel_size
, 所以這樣規定CBAM在cfg文件中的格式:
[cbam]
ratio=16
kernelsize=7
2. 修改解析部分
由於我們添加的這些參數都是自定義的,所以需要修改解析cfg文件的函數,之前講過,需要修改parse_config.py
中的部分內容:
def parse_model_cfg(path):
# path參數為: cfg/yolov3-tiny.cfg
if not path.endswith('.cfg'):
path += '.cfg'
if not os.path.exists(path) and \
os.path.exists('cfg' + os.sep + path):
path = 'cfg' + os.sep + path
with open(path, 'r') as f:
lines = f.read().split('\n')
# 去除以#開頭的,屬於注釋部分的內容
lines = [x for x in lines if x and not x.startswith('#')]
lines = [x.rstrip().lstrip() for x in lines]
mdefs = [] # 模塊的定義
for line in lines:
if line.startswith('['): # 標志着一個模塊的開始
'''
eg:
[shortcut]
from=-3
activation=linear
'''
mdefs.append({})
mdefs[-1]['type'] = line[1:-1].rstrip()
if mdefs[-1]['type'] == 'convolutional':
mdefs[-1]['batch_normalize'] = 0
else:
key, val = line.split("=")
key = key.rstrip()
if 'anchors' in key:
mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))
else:
mdefs[-1][key] = val.strip()
# Check all fields are supported
supported = ['type', 'batch_normalize', 'filters', 'size',\
'stride', 'pad', 'activation', 'layers', \
'groups','from', 'mask', 'anchors', \
'classes', 'num', 'jitter', 'ignore_thresh',\
'truth_thresh', 'random',\
'stride_x', 'stride_y']
f = [] # fields
for x in mdefs[1:]:
[f.append(k) for k in x if k not in f]
u = [x for x in f if x not in supported] # unsupported fields
assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)
return mdefs
以上內容中,需要改的是supported中的字段,將我們的內容添加進去:
supported = ['type', 'batch_normalize', 'filters', 'size',\
'stride', 'pad', 'activation', 'layers', \
'groups','from', 'mask', 'anchors', \
'classes', 'num', 'jitter', 'ignore_thresh',\
'truth_thresh', 'random',\
'stride_x', 'stride_y',\
'ratio', 'reduction', 'kernelsize']
3. 實現SE和CBAM
具體原理還請見【cv中的Attention機制】最簡單最易實現的SE模塊和【CV中的Attention機制】ECCV 2018 Convolutional Block Attention Module這兩篇文章,下邊直接使用以上兩篇文章中的代碼:
SE
class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
CBAM
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
super(SpatialAttention, self).__init__()
assert kernel_size in (3,7), "kernel size must be 3 or 7"
padding = 3if kernel_size == 7else1
self.conv = nn.Conv2d(2,1,kernel_size, padding=padding, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avgout = torch.mean(x, dim=1, keepdim=True)
maxout, _ = torch.max(x, dim=1, keepdim=True)
x = torch.cat([avgout, maxout], dim=1)
x = self.conv(x)
return self.sigmoid(x)
class ChannelAttention(nn.Module):
def __init__(self, in_planes, ratio=16):
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)
self.sharedMLP = nn.Sequential(
nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(),
nn.Conv2d(in_planes // rotio, in_planes, 1, bias=False))
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avgout = self.sharedMLP(self.avg_pool(x))
maxout = self.sharedMLP(self.max_pool(x))
return self.sigmoid(avgout + maxout)
以上就是兩個模塊的代碼,添加到models.py
文件中。
4. 設計cfg文件
這里以yolov3-tiny.cfg
為baseline,然后添加注意力機制模塊。
CBAM與SE類似,所以以SE為例,添加到backbone之后的部分,進行信息重構(refinement)。
[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[se]
reduction=16
# 在backbone結束的地方添加se模塊
#####backbone######
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear
[yolo]
mask = 3,4,5
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 8
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear
[yolo]
mask = 0,1,2
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
5. 模型構建
以上都是准備工作,以SE為例,我們修改model.py
文件中的模型加載部分,並修改forward函數部分的代碼,讓其正常發揮作用:
在model.py
中的create_modules
函數中進行添加:
elif mdef['type'] == 'se':
modules.add_module(
'se_module',
SELayer(output_filters[-1], reduction=int(mdef['reduction'])))
然后修改Darknet中的forward部分的函數:
def forward(self, x, var=None):
img_size = x.shape[-2:]
layer_outputs = []
output = []
for i, (mdef,
module) in enumerate(zip(self.module_defs, self.module_list)):
mtype = mdef['type']
if mtype in ['convolutional', 'upsample', 'maxpool']:
x = module(x)
elif mtype == 'route':
layers = [int(x) for x in mdef['layers'].split(',')]
if len(layers) == 1:
x = layer_outputs[layers[0]]
else:
try:
x = torch.cat([layer_outputs[i] for i in layers], 1)
except: # apply stride 2 for darknet reorg layer
layer_outputs[layers[1]] = F.interpolate(
layer_outputs[layers[1]], scale_factor=[0.5, 0.5])
x = torch.cat([layer_outputs[i] for i in layers], 1)
elif mtype == 'shortcut':
x = x + layer_outputs[int(mdef['from'])]
elif mtype == 'yolo':
output.append(module(x, img_size))
layer_outputs.append(x if i in self.routs else [])
在forward中加入SE模塊,其實很簡單。SE模塊與卷積層,上采樣,最大池化層地位是一樣的,不需要更多操作,只需要將以上部分代碼進行修改:
for i, (mdef,
module) in enumerate(zip(self.module_defs, self.module_list)):
mtype = mdef['type']
if mtype in ['convolutional', 'upsample', 'maxpool', 'se']:
x = module(x)
CBAM的整體過程類似,可以自己嘗試一下,順便熟悉一下YOLOv3的整體流程。
后記:本文的內容很簡單,只是添加了注意力模塊,很容易實現。不過具體注意力機制的位置、放多少個模塊等都需要做實驗來驗證。注意力機制並不是萬金油,需要多調參,多嘗試才能得到滿意的結果。歡迎大家聯系我加入群聊,反饋在各自數據集上的效果。
ps: 最近大家注意身體,出門戴口罩。
更多注意力機制模塊和即插即用模塊見:https://github.com/pprp/SimpleCVReproduction/tree/master/Plug-and-play module 歡迎star