轉載請注明出處:
https://www.cnblogs.com/darkknightzh/p/12152119.html
論文:
https://arxiv.org/abs/1811.12004
官方pytorch代碼:
https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch
1 簡介
light weight openpose是openpose的簡化版本,使用了openpose的大體流程。
Light weight openpose和openpose的區別是:
a 前者使用的是Mobilenet V1(到conv5_5),后者使用的是Vgg19(前10層)。
b 前者部分層使用了空洞卷積(dilated convolution)來提升感受視野,后者使用一般的卷積。
c 前者卷積核大小為3*3,后者為7*7。
d 前者只有一個refine stage,后者有5個stage。
e 前者的initial stage和refine stage里面的兩個分支(hotmaps和pafs)使用權值共享,后者則是並行的兩個分支。
2 改進
2.1 骨干網絡
論文中分析了openpose各階段的mAP及GFLOPs
發現從refine stage1之后,性能的提升不是非常明顯,但是GFLOPs增加的相當明顯,因而只保留了refine stage1,后面的都刪除了。
2.2 權值共享
openpose的每個stage使用下圖中左側的兩個並行的分支,分別預測hotmaps和pafs,為了進一步降低計算量,light weight openpose中將前幾層進行權值共享,如下圖右側所示。
2.3 空洞卷積
進一步的,light weight openpose使用含有空洞卷積的mobilenet v1替換掉了vgg10,GFLOPs進一步降低了很多,如下圖所示(下圖中2-stage network中的那個n/a,是指使用所有的refine stage進行訓練,但是使用的時候,只到refine stage 1,這樣測試時的計算量不變,后幾個階段無計算量,因而為n/a,同時最后一欄GFLOPs還是9)。
2.4 3*3 卷積
為了和vgg19有相同的感受視野,light weight openpose中使用下面的卷積塊來替代vgg19中的7*7卷積(具體的感受視野怎么計算的,不太清楚了。。。)。該圖對應代碼中的RefinementStageBlock。
3 訓練過程
分三個階段(不要和initial stage、refine stage弄混了)
a 使用MobileNet V1預訓練的模型訓練1個stage(initial stage + stage 1)的light weight openpose。此階段mAP大約在38%。
b 使用a的結果繼續訓練light weight openpose。此階段mAP大約在39%。
c 使用b的結果,將stage設置為3(initial stage + stage 1+ stage 2+ stage 3),繼續訓練light weight openpose;但是測試時,只使用stage=1時的結果估計姿態。此階段mAP大約在40%。
注意:
a每次訓練時,直接使用上次訓練得到的最后一個模型重新訓練,同時沒有改學習率等參數。
b每個階段驗證時,為了節約時間,可以只在在驗證集的子集上驗證(和在整個驗證集上性能差距很小)。
4 代碼
4.1 整體網絡結構
主要網絡代碼如下:

1 class PoseEstimationWithMobileNet(nn.Module): 2 def __init__(self, num_refinement_stages=1, num_channels=128, num_heatmaps=19, num_pafs=38): 3 super().__init__() 4 self.model = nn.Sequential( # mobilenet V1的骨干網絡 5 conv( 3, 32, stride=2, bias=False), # conv+BN+ReLU 6 conv_dw( 32, 64), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 7 conv_dw( 64, 128, stride=2), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 8 conv_dw(128, 128), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 9 conv_dw(128, 256, stride=2), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 10 conv_dw(256, 256), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 11 conv_dw(256, 512), # conv4_2 # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 12 conv_dw(512, 512, dilation=2, padding=2), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 13 conv_dw(512, 512), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 14 conv_dw(512, 512), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 15 conv_dw(512, 512), # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 16 conv_dw(512, 512) # conv5_5 # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU 17 ) 18 self.cpm = Cpm(512, num_channels) # 降維模塊 19 20 self.initial_stage = InitialStage(num_channels, num_heatmaps, num_pafs) # 初始階段 21 self.refinement_stages = nn.ModuleList() 22 for idx in range(num_refinement_stages): 23 self.refinement_stages.append(RefinementStage(num_channels + num_heatmaps + num_pafs, num_channels, num_heatmaps, num_pafs)) # refine階段 24 25 def forward(self, x): 26 backbone_features = self.model(x) 27 backbone_features = self.cpm(backbone_features) 28 29 stages_output = self.initial_stage(backbone_features) 30 for refinement_stage in self.refinement_stages: 31 stages_output.extend(refinement_stage(torch.cat([backbone_features, stages_output[-2], stages_output[-1]], dim=1))) 32 33 return stages_output 34 35 由於mobilenet V1輸出為512維,有一個cpm的降維層,降維到128維,如下: 36 class Cpm(nn.Module): 37 def __init__(self, in_channels, out_channels): 38 super().__init__() 39 self.align = conv(in_channels, out_channels, kernel_size=1, padding=0, bn=False) # conv+ReLU 40 self.trunk = nn.Sequential( 41 conv_dw_no_bn(out_channels, out_channels), # dw_conv(in,in)+ELU + conv(in,out)+ELU 42 conv_dw_no_bn(out_channels, out_channels), # dw_conv(in,in)+ELU + conv(in,out)+ELU 43 conv_dw_no_bn(out_channels, out_channels) # dw_conv(in,in)+ELU + conv(in,out)+ELU 44 ) 45 self.conv = conv(out_channels, out_channels, bn=False) # conv+ReLU 46 47 def forward(self, x): 48 x = self.align(x) 49 x = self.conv(x + self.trunk(x)) 50 return x
4.2 initial stage

1 class InitialStage(nn.Module): 2 def __init__(self, num_channels, num_heatmaps, num_pafs): 3 super().__init__() 4 self.trunk = nn.Sequential( # 權值共享 5 conv(num_channels, num_channels, bn=False), # conv+ReLU 6 conv(num_channels, num_channels, bn=False), # conv+ReLU 7 conv(num_channels, num_channels, bn=False) # conv+ReLU 8 ) 9 self.heatmaps = nn.Sequential( # heatmaps 10 conv(num_channels, 512, kernel_size=1, padding=0, bn=False), # 1*1conv+ReLU 11 conv(512, num_heatmaps, kernel_size=1, padding=0, bn=False, relu=False) # 1*1conv 12 ) 13 self.pafs = nn.Sequential( # pafs 14 conv(num_channels, 512, kernel_size=1, padding=0, bn=False), # 1*1conv+ReLU 15 conv(512, num_pafs, kernel_size=1, padding=0, bn=False, relu=False) # 1*1conv 16 ) 17 18 def forward(self, x): 19 trunk_features = self.trunk(x) 20 heatmaps = self.heatmaps(trunk_features) 21 pafs = self.pafs(trunk_features) 22 return [heatmaps, pafs]
4.3 refine stage
refine stage包括5個相同的RefinementStageBlock,用於權值共享。每個RefinementStageBlock如2.4所示。

1 class RefinementStageBlock(nn.Module): 2 def __init__(self, in_channels, out_channels): 3 super().__init__() 4 self.initial = conv(in_channels, out_channels, kernel_size=1, padding=0, bn=False) # 1*1conv+ReLU 5 self.trunk = nn.Sequential( 6 conv(out_channels, out_channels), # conv+BN+ReLU 7 conv(out_channels, out_channels, dilation=2, padding=2) # conv+BN+ReLU 8 ) 9 10 def forward(self, x): 11 initial_features = self.initial(x) 12 trunk_features = self.trunk(initial_features) 13 return initial_features + trunk_features # 論文中2個3*3conv代替7*7conv 14 15 16 class RefinementStage(nn.Module): 17 def __init__(self, in_channels, out_channels, num_heatmaps, num_pafs): 18 super().__init__() 19 self.trunk = nn.Sequential( # 權值共享 20 RefinementStageBlock(in_channels, out_channels), 21 RefinementStageBlock(out_channels, out_channels), 22 RefinementStageBlock(out_channels, out_channels), 23 RefinementStageBlock(out_channels, out_channels), 24 RefinementStageBlock(out_channels, out_channels) 25 ) 26 self.heatmaps = nn.Sequential( # heatmaps 27 conv(out_channels, out_channels, kernel_size=1, padding=0, bn=False), # 1*1conv+ReLU 28 conv(out_channels, num_heatmaps, kernel_size=1, padding=0, bn=False, relu=False) # 1*1conv 29 ) 30 self.pafs = nn.Sequential( # pafs 31 conv(out_channels, out_channels, kernel_size=1, padding=0, bn=False), # 1*1conv+ReLU 32 conv(out_channels, num_pafs, kernel_size=1, padding=0, bn=False, relu=False) # 1*1conv 33 ) 34 35 def forward(self, x): 36 trunk_features = self.trunk(x) 37 heatmaps = self.heatmaps(trunk_features) 38 pafs = self.pafs(trunk_features) 39 return [heatmaps, pafs]
4.4 各種自定義的conv
上面網絡中使用的conv結構如下:

1 def conv(in_channels, out_channels, kernel_size=3, padding=1, bn=True, dilation=1, stride=1, relu=True, bias=True): 2 modules = [nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, bias=bias)] 3 if bn: 4 modules.append(nn.BatchNorm2d(out_channels)) 5 if relu: 6 modules.append(nn.ReLU(inplace=True)) 7 return nn.Sequential(*modules) 8 9 10 def conv_dw(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1): 11 return nn.Sequential( 12 nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False), 13 nn.BatchNorm2d(in_channels), 14 nn.ReLU(inplace=True), 15 16 nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False), 17 nn.BatchNorm2d(out_channels), 18 nn.ReLU(inplace=True), 19 ) 20 21 22 def conv_dw_no_bn(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1): 23 return nn.Sequential( 24 nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False), 25 nn.ELU(inplace=True), 26 27 nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False), 28 nn.ELU(inplace=True), 29 )
ELU激活函數如下:
4.5 損失函數
網絡的損失函數如下,由於COCO數據庫對某些很小的人沒有標注,將這些地方的mask設置為0,防止這些人對訓練造成干擾。

1 def l2_loss(input, target, mask, batch_size): 2 loss = (input - target) * mask 3 loss = (loss * loss) / 2 / batch_size 4 5 return loss.sum()
如下圖a為圖像,b為mask_miss。COCO中把遠處的人標注了,但是沒有標注關節點信息,為了防止這些人干擾訓練,因而才有了mask_miss。所有人的mask減去mask_miss,就是上面的mask了。
(a)
(b)
4.6 train
train用到了ConvertKeypoints,Scale Rotate,CropPad,Flip等變換,見4.7.

1 def train(prepared_train_labels, train_images_folder, num_refinement_stages, base_lr, batch_size, batches_per_iter, 2 num_workers, checkpoint_path, weights_only, from_mobilenet, checkpoints_folder, log_after, 3 val_labels, val_images_folder, val_output_name, checkpoint_after, val_after): 4 net = PoseEstimationWithMobileNet(num_refinement_stages) 5 6 stride = 8 # 輸入圖像是特征圖的倍數 7 sigma = 7 # 生成關節點heatmaps時,高斯核的標准差 8 path_thickness = 1 # 生成paf時軀干的寬度 9 dataset = CocoTrainDataset(prepared_train_labels, train_images_folder, 10 stride, sigma, path_thickness, 11 transform=transforms.Compose([ 12 ConvertKeypoints(), 13 Scale(), 14 Rotate(pad=(128, 128, 128)), 15 CropPad(pad=(128, 128, 128)), 16 Flip()])) 17 train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers) 18 19 optimizer = optim.Adam([ 20 {'params': get_parameters_conv(net.model, 'weight')}, 21 {'params': get_parameters_conv_depthwise(net.model, 'weight'), 'weight_decay': 0}, 22 {'params': get_parameters_bn(net.model, 'weight'), 'weight_decay': 0}, 23 {'params': get_parameters_bn(net.model, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0}, 24 {'params': get_parameters_conv(net.cpm, 'weight'), 'lr': base_lr}, 25 {'params': get_parameters_conv(net.cpm, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0}, 26 {'params': get_parameters_conv_depthwise(net.cpm, 'weight'), 'weight_decay': 0}, 27 {'params': get_parameters_conv(net.initial_stage, 'weight'), 'lr': base_lr}, 28 {'params': get_parameters_conv(net.initial_stage, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0}, 29 {'params': get_parameters_conv(net.refinement_stages, 'weight'), 'lr': base_lr * 4}, 30 {'params': get_parameters_conv(net.refinement_stages, 'bias'), 'lr': base_lr * 8, 'weight_decay': 0}, 31 {'params': get_parameters_bn(net.refinement_stages, 'weight'), 'weight_decay': 0}, 32 {'params': get_parameters_bn(net.refinement_stages, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0}, 33 ], lr=base_lr, weight_decay=5e-4) 34 35 num_iter = 0 36 current_epoch = 0 37 drop_after_epoch = [100, 200, 260] 38 scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=drop_after_epoch, gamma=0.333) 39 if checkpoint_path: 40 checkpoint = torch.load(checkpoint_path) 41 if from_mobilenet: 42 load_from_mobilenet(net, checkpoint) 43 else: 44 load_state(net, checkpoint) 45 if not weights_only: 46 optimizer.load_state_dict(checkpoint['optimizer']) 47 scheduler.load_state_dict(checkpoint['scheduler']) 48 num_iter = checkpoint['iter'] 49 current_epoch = checkpoint['current_epoch'] 50 51 net = DataParallel(net).cuda() 52 net.train() 53 for epochId in range(current_epoch, 280): 54 scheduler.step() 55 total_losses = [0, 0] * (num_refinement_stages + 1) # heatmaps loss, paf loss per stage(initial stage + refine stage) 56 batch_per_iter_idx = 0 57 for batch_data in train_loader: 58 if batch_per_iter_idx == 0: 59 optimizer.zero_grad() 60 61 images = batch_data['image'].cuda() 62 keypoint_masks = batch_data['keypoint_mask'].cuda() 63 paf_masks = batch_data['paf_mask'].cuda() 64 keypoint_maps = batch_data['keypoint_maps'].cuda() 65 paf_maps = batch_data['paf_maps'].cuda() 66 67 stages_output = net(images) 68 69 losses = [] 70 for loss_idx in range(len(total_losses) // 2): 71 losses.append(l2_loss(stages_output[loss_idx * 2], keypoint_maps, keypoint_masks, images.shape[0])) # 2i維為熱圖 72 losses.append(l2_loss(stages_output[loss_idx * 2 + 1], paf_maps, paf_masks, images.shape[0])) # 2i+1維為paf 73 total_losses[loss_idx * 2] += losses[-2].item() / batches_per_iter # 累積loss 74 total_losses[loss_idx * 2 + 1] += losses[-1].item() / batches_per_iter # 累積loss 75 76 loss = losses[0] 77 for loss_idx in range(1, len(losses)): 78 loss += losses[loss_idx] # 計算所有stage的loss 79 loss /= batches_per_iter # loss平均 80 loss.backward() 81 batch_per_iter_idx += 1 82 if batch_per_iter_idx == batches_per_iter: 83 optimizer.step() 84 batch_per_iter_idx = 0 85 num_iter += 1 86 else: 87 continue 88 89 if num_iter % log_after == 0: 90 print('Iter: {}'.format(num_iter)) 91 for loss_idx in range(len(total_losses) // 2): 92 print('\n'.join(['stage{}_pafs_loss: {}', 'stage{}_heatmaps_loss: {}']).format( 93 loss_idx + 1, total_losses[loss_idx * 2 + 1] / log_after, loss_idx + 1, total_losses[loss_idx * 2] / log_after)) 94 for loss_idx in range(len(total_losses)): 95 total_losses[loss_idx] = 0 96 if num_iter % checkpoint_after == 0: 97 snapshot_name = '{}/checkpoint_iter_{}.pth'.format(checkpoints_folder, num_iter) 98 torch.save({'state_dict': net.module.state_dict(), 99 'optimizer': optimizer.state_dict(), 100 'scheduler': scheduler.state_dict(), 101 'iter': num_iter, 102 'current_epoch': epochId}, 103 snapshot_name) 104 # if num_iter % val_after == 0: 105 #print('Validation...') 106 #evaluate(val_labels, val_output_name, val_images_folder, net) 107 #net.train()
4.7 transformations
transformations主要包括ConvertKeypoints,Scale Rotate,CropPad,Flip等變換。
4.7.1 ConvertKeypoints
ConvertKeypoints用於將coco的關鍵點順序變換到代碼中的關鍵點順序。

1 class ConvertKeypoints(object): 2 def __call__(self, sample): 3 label = sample['label'] 4 h, w, _ = sample['image'].shape 5 keypoints = label['keypoints'] # keypoint[2]=0: 遮擋 1:可見 2:不在圖像內 6 for keypoint in keypoints: # keypoint[2] == 0: occluded, == 1: visible, == 2: not in image 7 if keypoint[0] == keypoint[1] == 0: 8 keypoint[2] = 2 9 if (keypoint[0] < 0 or keypoint[0] >= w or keypoint[1] < 0 or keypoint[1] >= h): 10 keypoint[2] = 2 11 for other_label in label['processed_other_annotations']: 12 keypoints = other_label['keypoints'] 13 for keypoint in keypoints: 14 if keypoint[0] == keypoint[1] == 0: 15 keypoint[2] = 2 16 if (keypoint[0] < 0 or keypoint[0] >= w or keypoint[1] < 0 or keypoint[1] >= h): 17 keypoint[2] = 2 18 label['keypoints'] = self._convert(label['keypoints'], w, h) # 變成文中關節點的順序,同時增加脖子 19 20 for other_label in label['processed_other_annotations']: 21 other_label['keypoints'] = self._convert(other_label['keypoints'], w, h) # 變成文中關節點的順序,同時增加脖子 22 return sample 23 24 def _convert(self, keypoints, w, h): 25 # Nose, Neck, R hand, L hand, R leg, L leg, Eyes, Ears 26 reorder_map = [1, 7, 9, 11, 6, 8, 10, 13, 15, 17, 12, 14, 16, 3, 2, 5, 4] # COCO關節點到文中關節點的映射 27 converted_keypoints = list(keypoints[i - 1] for i in reorder_map) # 映射到文中的關節點順序 28 # Add neck as a mean of shoulders 29 converted_keypoints.insert(1, [(keypoints[5][0] + keypoints[6][0]) / 2, (keypoints[5][1] + keypoints[6][1]) / 2, 0]) # 增加脖子 30 if keypoints[5][2] == 2 and keypoints[6][2] == 2: 31 converted_keypoints[1][2] = 2 32 elif keypoints[5][2] == 3 and keypoints[6][2] == 3: 33 converted_keypoints[1][2] = 3 34 elif keypoints[5][2] == 1 and keypoints[6][2] == 1: 35 converted_keypoints[1][2] = 1 36 if (converted_keypoints[1][0] < 0 or converted_keypoints[1][0] >= w or converted_keypoints[1][1] < 0 or converted_keypoints[1][1] >= h): 37 converted_keypoints[1][2] = 2 38 return converted_keypoints
其中coco和代碼中的關鍵點順序分別如下圖所示,通過reorder_map中的值-1變換,並插入neck。
4.7.2 Scale
Scale用於縮放圖像及關鍵點信息。

1 class Scale(object): 2 def __init__(self, prob=1, min_scale=0.5, max_scale=1.1, target_dist=0.6): 3 self._prob = prob 4 self._min_scale = min_scale 5 self._max_scale = max_scale 6 self._target_dist = target_dist 7 8 def __call__(self, sample): 9 prob = random.random() 10 scale_multiplier = 1 11 if prob <= self._prob: 12 prob = random.random() 13 scale_multiplier = (self._max_scale - self._min_scale) * prob + self._min_scale 14 label = sample['label'] 15 scale_abs = self._target_dist / label['scale_provided'] 16 scale = scale_abs * scale_multiplier 17 sample['image'] = cv2.resize(sample['image'], dsize=(0, 0), fx=scale, fy=scale) 18 label['img_height'], label['img_width'], _ = sample['image'].shape 19 sample['mask'] = cv2.resize(sample['mask'], dsize=(0, 0), fx=scale, fy=scale) 20 21 label['objpos'][0] *= scale 22 label['objpos'][1] *= scale 23 for keypoint in sample['label']['keypoints']: 24 keypoint[0] *= scale 25 keypoint[1] *= scale 26 for other_annotation in sample['label']['processed_other_annotations']: 27 other_annotation['objpos'][0] *= scale 28 other_annotation['objpos'][1] *= scale 29 for keypoint in other_annotation['keypoints']: 30 keypoint[0] *= scale 31 keypoint[1] *= scale 32 return sample
4.7.3 Rotate
Rotate用於旋轉圖像及關鍵點信息。

1 class Rotate(object): 2 def __init__(self, pad, max_rotate_degree=40): 3 self._pad = pad 4 self._max_rotate_degree = max_rotate_degree 5 6 def __call__(self, sample): 7 prob = random.random() 8 degree = (prob - 0.5) * 2 * self._max_rotate_degree 9 h, w, _ = sample['image'].shape 10 img_center = (w / 2, h / 2) 11 R = cv2.getRotationMatrix2D(img_center, degree, 1) 12 13 abs_cos = abs(R[0, 0]) 14 abs_sin = abs(R[0, 1]) 15 16 bound_w = int(h * abs_sin + w * abs_cos) 17 bound_h = int(h * abs_cos + w * abs_sin) 18 dsize = (bound_w, bound_h) 19 20 R[0, 2] += dsize[0] / 2 - img_center[0] 21 R[1, 2] += dsize[1] / 2 - img_center[1] 22 sample['image'] = cv2.warpAffine(sample['image'], R, dsize=dsize, borderMode=cv2.BORDER_CONSTANT, borderValue=self._pad) 23 sample['label']['img_height'], sample['label']['img_width'], _ = sample['image'].shape 24 sample['mask'] = cv2.warpAffine(sample['mask'], R, dsize=dsize, borderMode=cv2.BORDER_CONSTANT, borderValue=(1, 1, 1)) # border is ok 25 label = sample['label'] 26 label['objpos'] = self._rotate(label['objpos'], R) # 旋轉位置坐標 27 for keypoint in label['keypoints']: 28 point = [keypoint[0], keypoint[1]] 29 point = self._rotate(point, R) # 旋轉位置坐標 30 keypoint[0], keypoint[1] = point[0], point[1] 31 for other_annotation in label['processed_other_annotations']: 32 for keypoint in other_annotation['keypoints']: 33 point = [keypoint[0], keypoint[1]] 34 point = self._rotate(point, R) # 旋轉位置坐標 35 keypoint[0], keypoint[1] = point[0], point[1] 36 return sample 37 38 def _rotate(self, point, R): 39 return [R[0, 0] * point[0] + R[0, 1] * point[1] + R[0, 2], R[1, 0] * point[0] + R[1, 1] * point[1] + R[1, 2]]
4.7.4 CropPad
CropPad用於隨機裁剪

1 class CropPad(object): 2 def __init__(self, pad, center_perterb_max=40, crop_x=368, crop_y=368): 3 self._pad = pad 4 self._center_perterb_max = center_perterb_max 5 self._crop_x = crop_x 6 self._crop_y = crop_y 7 8 def __call__(self, sample): 9 prob_x = random.random() 10 prob_y = random.random() 11 12 offset_x = int((prob_x - 0.5) * 2 * self._center_perterb_max) 13 offset_y = int((prob_y - 0.5) * 2 * self._center_perterb_max) 14 label = sample['label'] 15 shifted_center = (label['objpos'][0] + offset_x, label['objpos'][1] + offset_y) 16 offset_left = -int(shifted_center[0] - self._crop_x / 2) 17 offset_up = -int(shifted_center[1] - self._crop_y / 2) 18 19 cropped_image = np.empty(shape=(self._crop_y, self._crop_x, 3), dtype=np.uint8) 20 for i in range(3): 21 cropped_image[:, :, i].fill(self._pad[i]) 22 cropped_mask = np.empty(shape=(self._crop_y, self._crop_x), dtype=np.uint8) 23 cropped_mask.fill(1) 24 25 image_x_start = int(shifted_center[0] - self._crop_x / 2) 26 image_y_start = int(shifted_center[1] - self._crop_y / 2) 27 image_x_finish = image_x_start + self._crop_x 28 image_y_finish = image_y_start + self._crop_y 29 crop_x_start = 0 30 crop_y_start = 0 31 crop_x_finish = self._crop_x 32 crop_y_finish = self._crop_y 33 34 w, h = label['img_width'], label['img_height'] 35 should_crop = True 36 if image_x_start < 0: # Adjust crop area 37 crop_x_start -= image_x_start 38 image_x_start = 0 39 if image_x_start >= w: 40 should_crop = False 41 42 if image_y_start < 0: 43 crop_y_start -= image_y_start 44 image_y_start = 0 45 if image_y_start >= w: 46 should_crop = False 47 48 if image_x_finish > w: 49 diff = image_x_finish - w 50 image_x_finish -= diff 51 crop_x_finish -= diff 52 if image_x_finish < 0: 53 should_crop = False 54 55 if image_y_finish > h: 56 diff = image_y_finish - h 57 image_y_finish -= diff 58 crop_y_finish -= diff 59 if image_y_finish < 0: 60 should_crop = False 61 62 if should_crop: 63 cropped_image[crop_y_start:crop_y_finish, crop_x_start:crop_x_finish, :] =\ 64 sample['image'][image_y_start:image_y_finish, image_x_start:image_x_finish, :] 65 cropped_mask[crop_y_start:crop_y_finish, crop_x_start:crop_x_finish] =\ 66 sample['mask'][image_y_start:image_y_finish, image_x_start:image_x_finish] 67 68 sample['image'] = cropped_image 69 sample['mask'] = cropped_mask 70 label['img_width'] = self._crop_x 71 label['img_height'] = self._crop_y 72 73 label['objpos'][0] += offset_left 74 label['objpos'][1] += offset_up 75 for keypoint in label['keypoints']: 76 keypoint[0] += offset_left 77 keypoint[1] += offset_up 78 for other_annotation in label['processed_other_annotations']: 79 for keypoint in other_annotation['keypoints']: 80 keypoint[0] += offset_left 81 keypoint[1] += offset_up 82 83 return sample 84 85 def _inside(self, point, width, height): 86 if point[0] < 0 or point[1] < 0: 87 return False 88 if point[0] >= width or point[1] >= height: 89 return False 90 return True
4.7.5 Flip
此處的Flip,用於在訓練階段左右鏡像圖像。此時只需要將關鍵點對應位置左右互換(如_swap_left_right中的right和left),由於還未得到paf,因而不需要對paf進行任何處理。

1 class Flip(object): 2 def __init__(self, prob=0.5): 3 self._prob = prob 4 5 def __call__(self, sample): 6 prob = random.random() 7 do_flip = prob <= self._prob 8 if not do_flip: 9 return sample 10 11 sample['image'] = cv2.flip(sample['image'], 1) 12 sample['mask'] = cv2.flip(sample['mask'], 1) 13 14 label = sample['label'] 15 w, h = label['img_width'], label['img_height'] 16 label['objpos'][0] = w - 1 - label['objpos'][0] 17 for keypoint in label['keypoints']: 18 keypoint[0] = w - 1 - keypoint[0] 19 label['keypoints'] = self._swap_left_right(label['keypoints']) # 交換左右關節點 20 21 for other_annotation in label['processed_other_annotations']: 22 other_annotation['objpos'][0] = w - 1 - other_annotation['objpos'][0] # 水平鏡像,只寬度需要重新計算 23 for keypoint in other_annotation['keypoints']: 24 keypoint[0] = w - 1 - keypoint[0] 25 other_annotation['keypoints'] = self._swap_left_right(other_annotation['keypoints']) # 交換左右關節點 26 27 return sample 28 29 def _swap_left_right(self, keypoints): 30 right = [2, 3, 4, 8, 9, 10, 14, 16] # 左右關節點索引 31 left = [5, 6, 7, 11, 12, 13, 15, 17] 32 for r, l in zip(right, left): 33 keypoints[r], keypoints[l] = keypoints[l], keypoints[r] 34 return keypoints
4.8 val
val的代碼沒啥好說的,也就是convert_to_coco_format

1 def convert_to_coco_format(pose_entries, all_keypoints): 2 coco_keypoints = [] 3 scores = [] 4 for n in range(len(pose_entries)): 5 if len(pose_entries[n]) == 0: 6 continue 7 keypoints = [0] * 17 * 3 8 to_coco_map = [0, -1, 6, 8, 10, 5, 7, 9, 12, 14, 16, 11, 13, 15, 2, 1, 4, 3] 9 person_score = pose_entries[n][-2] 10 position_id = -1 11 for keypoint_id in pose_entries[n][:-2]: # 最后一個為分配給當前人的關節點的數量,倒數第二個為得分。因而去掉這兩個。 12 position_id += 1 13 if position_id == 1: # no 'neck' in COCO。COCO中沒有neck,而本代碼中neck的idx為1,因而idx為1時,continue 14 continue 15 16 cx, cy, score, visibility = 0, 0, 0, 0 # keypoint not found 17 if keypoint_id != -1: 18 cx, cy, score = all_keypoints[int(keypoint_id), 0:3] 19 cx = cx + 0.5 20 cy = cy + 0.5 21 visibility = 1 22 keypoints[to_coco_map[position_id] * 3 + 0] = cx 23 keypoints[to_coco_map[position_id] * 3 + 1] = cy 24 keypoints[to_coco_map[position_id] * 3 + 2] = visibility 25 coco_keypoints.append(keypoints) 26 scores.append(person_score * max(0, (pose_entries[n][-1] - 1))) # -1 for 'neck' 27 return coco_keypoints, scores
4.9 gt label的生成
gt label通過coco.py生成,如下。其中BODY_PARTS_KPT_IDS將4.7中openpose的關鍵點映射到下面的軀干。

1 BODY_PARTS_KPT_IDS = [[1, 8], [8, 9], [9, 10], [1, 11], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [2, 16], 2 [1, 5], [5, 6], [6, 7], [5, 17], [1, 0], [0, 14], [0, 15], [14, 16], [15, 17]] 3 4 5 def get_mask(segmentations, mask): 6 for segmentation in segmentations: 7 rle = pycocotools.mask.frPyObjects(segmentation, mask.shape[0], mask.shape[1]) 8 mask[pycocotools.mask.decode(rle) > 0.5] = 0 9 return mask 10 11 12 class CocoTrainDataset(Dataset): 13 def __init__(self, labels, images_folder, stride, sigma, paf_thickness, transform=None): 14 super().__init__() 15 self._images_folder = images_folder 16 self._stride = stride 17 self._sigma = sigma 18 self._paf_thickness = paf_thickness 19 self._transform = transform 20 with open(labels, 'rb') as f: 21 self._labels = pickle.load(f) 22 23 def __getitem__(self, idx): 24 label = copy.deepcopy(self._labels[idx]) # label modified in transform 25 image = cv2.imread(os.path.join(self._images_folder, label['img_paths']), cv2.IMREAD_COLOR) 26 mask = np.ones(shape=(label['img_height'], label['img_width']), dtype=np.float32) 27 mask = get_mask(label['segmentations'], mask) 28 sample = {'label': label, 'image': image, 'mask': mask} 29 if self._transform: 30 sample = self._transform(sample) 31 32 mask = cv2.resize(sample['mask'], dsize=None, fx=1/self._stride, fy=1/self._stride, interpolation=cv2.INTER_AREA) 33 keypoint_maps = self._generate_keypoint_maps(sample) # 生成高斯分布的熱圖 34 sample['keypoint_maps'] = keypoint_maps 35 keypoint_mask = np.zeros(shape=keypoint_maps.shape, dtype=np.float32) # 熱圖的mask 36 for idx in range(keypoint_mask.shape[0]): 37 keypoint_mask[idx] = mask # 將實際mask復制到熱圖mask的每一層上面 38 sample['keypoint_mask'] = keypoint_mask 39 40 paf_maps = self._generate_paf_maps(sample) # 增加paf 41 sample['paf_maps'] = paf_maps 42 paf_mask = np.zeros(shape=paf_maps.shape, dtype=np.float32) 43 for idx in range(paf_mask.shape[0]): 44 paf_mask[idx] = mask # 將實際mask復制到paf mask的每一層上面 45 sample['paf_mask'] = paf_mask 46 47 image = sample['image'].astype(np.float32) 48 image = (image - 128) / 256 # 歸一化 49 sample['image'] = image.transpose((2, 0, 1)) # bgr to rgb 50 return sample 51 52 def __len__(self): 53 return len(self._labels) 54 55 def _generate_keypoint_maps(self, sample): 56 n_keypoints = 18 # 關節點總數量 57 n_rows, n_cols, _ = sample['image'].shape 58 keypoint_maps = np.zeros(shape=(n_keypoints + 1, n_rows // self._stride, n_cols // self._stride), dtype=np.float32) # +1 for bg,增加背景 59 60 label = sample['label'] 61 for keypoint_idx in range(n_keypoints): 62 keypoint = label['keypoints'][keypoint_idx] 63 if keypoint[2] <= 1: 64 self._add_gaussian(keypoint_maps[keypoint_idx], keypoint[0], keypoint[1], self._stride, self._sigma) # 熱圖每一層增加高斯分布的熱圖 65 for another_annotation in label['processed_other_annotations']: 66 keypoint = another_annotation['keypoints'][keypoint_idx] 67 if keypoint[2] <= 1: 68 self._add_gaussian(keypoint_maps[keypoint_idx], keypoint[0], keypoint[1], self._stride, self._sigma) # 熱圖每一層增加高斯分布的熱圖 69 keypoint_maps[-1] = 1 - keypoint_maps.max(axis=0) # 背景 70 return keypoint_maps 71 72 def _add_gaussian(self, keypoint_map, x, y, stride, sigma): 73 n_sigma = 4 74 tl = [int(x - n_sigma * sigma), int(y - n_sigma * sigma)] # 根據當前坐標,算出在4sigma內的起點和終點,此處為起點 75 tl[0] = max(tl[0], 0) 76 tl[1] = max(tl[1], 0) 77 78 br = [int(x + n_sigma * sigma), int(y + n_sigma * sigma)] # 根據當前坐標,算出在4sigma內的起點和終點,此處為終點 79 map_h, map_w = keypoint_map.shape # 特征圖大小 80 br[0] = min(br[0], map_w * stride) # 放大回原始圖像大小 81 br[1] = min(br[1], map_h * stride) # 放大回原始圖像大小 82 83 shift = stride / 2 - 0.5 84 for map_y in range(tl[1] // stride, br[1] // stride): # y在特征圖上的范圍 85 for map_x in range(tl[0] // stride, br[0] // stride): # x在特征圖上的范圍 86 d2 = (map_x * stride + shift - x) * (map_x * stride + shift - x) + (map_y * stride + shift - y) * (map_y * stride + shift - y) # 距離的平方 87 exponent = d2 / 2 / sigma / sigma 88 if exponent > 4.6052: # threshold, ln(100), ~0.01 89 continue 90 keypoint_map[map_y, map_x] += math.exp(-exponent) # 不同關節點熱圖求和,而非像論文中那樣使用max 91 if keypoint_map[map_y, map_x] > 1: 92 keypoint_map[map_y, map_x] = 1 93 94 def _generate_paf_maps(self, sample): 95 n_pafs = len(BODY_PARTS_KPT_IDS) 96 n_rows, n_cols, _ = sample['image'].shape 97 paf_maps = np.zeros(shape=(n_pafs * 2, n_rows // self._stride, n_cols // self._stride), dtype=np.float32) 98 99 label = sample['label'] 100 for paf_idx in range(n_pafs): 101 keypoint_a = label['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][0]] # 當前軀干起點 102 keypoint_b = label['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][1]] # 當前軀干終點 103 if keypoint_a[2] <= 1 and keypoint_b[2] <= 1: # 起點和終點均在圖像內,則增加paf 104 self._set_paf(paf_maps[paf_idx * 2:paf_idx * 2 + 2], keypoint_a[0], keypoint_a[1], keypoint_b[0], keypoint_b[1], self._stride, self._paf_thickness) 105 for another_annotation in label['processed_other_annotations']: 106 keypoint_a = another_annotation['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][0]] # 當前軀干起點 107 keypoint_b = another_annotation['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][1]] # 當前軀干終點 108 if keypoint_a[2] <= 1 and keypoint_b[2] <= 1: # 起點和終點均在圖像內,則增加paf 109 self._set_paf(paf_maps[paf_idx * 2:paf_idx * 2 + 2], keypoint_a[0], keypoint_a[1], keypoint_b[0], keypoint_b[1], self._stride, self._paf_thickness) 110 return paf_maps 111 112 def _set_paf(self, paf_map, x_a, y_a, x_b, y_b, stride, thickness): 113 x_a /= stride # 原始坐標映射到特征圖上坐標 114 y_a /= stride 115 x_b /= stride 116 y_b /= stride 117 x_ba = x_b - x_a # x方向長度 118 y_ba = y_b - y_a # y方向長度 119 _, h_map, w_map = paf_map.shape 120 x_min = int(max(min(x_a, x_b) - thickness, 0)) # 起點到終點的方框四周增加thickness個像素 121 x_max = int(min(max(x_a, x_b) + thickness, w_map)) 122 y_min = int(max(min(y_a, y_b) - thickness, 0)) 123 y_max = int(min(max(y_a, y_b) + thickness, h_map)) 124 norm_ba = (x_ba * x_ba + y_ba * y_ba) ** 0.5 # 起點指向終點的向量的模長 125 if norm_ba < 1e-7: # Same points, no paf 126 return 127 x_ba /= norm_ba # 起點指向終點的單位向量的x長度 128 y_ba /= norm_ba # 起點指向終點的單位向量的y長度 129 130 for y in range(y_min, y_max): # 依次遍歷該方框中每一個點 131 for x in range(x_min, x_max): 132 x_ca = x - x_a # 起點指向當前點的向量 133 y_ca = y - y_a 134 d = math.fabs(x_ca * y_ba - y_ca * x_ba) # 起點指向當前點的向量在起點指向終點的單位向量垂直的單位向量上的投影 135 if d <= thickness: # 投影小於閾值,則增加該單位向量到paf對應軀干中 136 paf_map[0, y, x] = x_ba 137 paf_map[1, y, x] = y_ba 138 139 140 class CocoValDataset(Dataset): 141 def __init__(self, labels, images_folder): 142 super().__init__() 143 with open(labels, 'r') as f: 144 self._labels = json.load(f) 145 self._images_folder = images_folder 146 147 def __getitem__(self, idx): 148 file_name = self._labels['images'][idx]['file_name'] 149 img = cv2.imread(os.path.join(self._images_folder, file_name), cv2.IMREAD_COLOR) 150 return {'img': img, 'file_name': file_name} 151 152 def __len__(self): 153 return len(self._labels['images'])
注意:_add_gaussian的最后兩行,合並多個高斯confidence maps時,沒有使用論文中的max,而是使用min(sum(peaks), 1)。此處和官方openpose代碼一致,該文件位於caffe_train-master/src/caffe/cpm_data_transformer.cpp,具體代碼如下:
另一方面,_set_paf函數最后兩行,直將當前的單位向量增加到pafs中。若一個人某軀干將另一個人相同的軀干遮擋(或出現交叉的情況),則只會計算某一個軀干(依遍歷順序而定),但是實際上這種情況發生的概率應該相當低。
4.10 extract_keypoints和group_keypoints
在提取關節點extract_keypoints的函數中,給每個提取到的關節點分配了一個索引,這樣所有的關節點索引均不相同。在group_keypoints 中,將這個索引放到pose_entries對應的位置,這樣不會有關節點被分配給2個人。如下面(a)、(b)兩個圖所示。
(a)
(b)
keypoints.py如下:

1 # 本文件中新的paf順序,不確定為何不用coco.py中原始的順序??? 2 BODY_PARTS_KPT_IDS = [[1, 2], [1, 5], [2, 3], [3, 4], [5, 6], [6, 7], [1, 8], [8, 9], [9, 10], [1, 11], 3 [11, 12], [12, 13], [1, 0], [0, 14], [14, 16], [0, 15], [15, 17], [2, 16], [5, 17]] 4 # 本文件中新的paf順序在原始paf(coco.py)中的x和y坐標的索引 5 BODY_PARTS_PAF_IDS = ([12, 13], [20, 21], [14, 15], [16, 17], [22, 23], [24, 25], [0, 1], [2, 3], [4, 5], [6, 7], 6 [8, 9], [10, 11], [28, 29], [30, 31], [34, 35], [32, 33], [36, 37], [18, 19], [26, 27]) 7 8 9 def linspace2d(start, stop, n=10): 10 points = 1 / (n - 1) * (stop - start) # 起點和終點之間插值點,包括終點共n個 11 return points[:, None] * np.arange(n) + start[:, None] 12 13 14 def extract_keypoints(heatmap, all_keypoints, total_keypoint_num): 15 heatmap[heatmap < 0.1] = 0 # 熱圖中小於閾值的置0 16 heatmap_with_borders = np.pad(heatmap, [(2, 2), (2, 2)], mode='constant') # 邊界各填充2個像素 17 heatmap_center = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 1:heatmap_with_borders.shape[1]-1] # heatmap_center中心,比熱圖四邊各多1個像素 18 heatmap_left = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 2:heatmap_with_borders.shape[1]] # 實際上為熱圖右邊的圖 19 heatmap_right = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 0:heatmap_with_borders.shape[1]-2] # 實際上為熱圖左邊的圖 20 heatmap_up = heatmap_with_borders[2:heatmap_with_borders.shape[0], 1:heatmap_with_borders.shape[1]-1] # 實際上為熱圖下邊的圖 21 heatmap_down = heatmap_with_borders[0:heatmap_with_borders.shape[0]-2, 1:heatmap_with_borders.shape[1]-1] # 實際上為熱圖上邊的圖 22 23 heatmap_peaks = (heatmap_center > heatmap_left) & (heatmap_center > heatmap_right) &\ 24 (heatmap_center > heatmap_up) & (heatmap_center > heatmap_down) # 熱圖當前像素比上下左右的熱圖的像素都大的,為峰值 25 heatmap_peaks = heatmap_peaks[1:heatmap_center.shape[0]-1, 1:heatmap_center.shape[1]-1] # 得到和原始的熱圖一樣大的熱圖 26 keypoints = list(zip(np.nonzero(heatmap_peaks)[1], np.nonzero(heatmap_peaks)[0])) # (w, h) 得到峰值(關節點)的xy坐標 np.nonzero得到2*N向量,0為x,1為y 27 keypoints = sorted(keypoints, key=itemgetter(0)) # 按照x坐標從小到大排序 28 29 suppressed = np.zeros(len(keypoints), np.uint8) # 第i個坐標(關節點)應該被抑制的flag 30 keypoints_with_score_and_id = [] 31 keypoint_num = 0 32 for i in range(len(keypoints)): 33 if suppressed[i]: 34 continue 35 for j in range(i+1, len(keypoints)): # 依次比較第i點和后面所有j點距離的平方的和,小於閾值,則抑制后面第j個點 36 if math.sqrt((keypoints[i][0] - keypoints[j][0]) ** 2 + (keypoints[i][1] - keypoints[j][1]) ** 2) < 6: 37 suppressed[j] = 1 38 keypoint_with_score_and_id = (keypoints[i][0], keypoints[i][1], heatmap[keypoints[i][1], keypoints[i][0]], total_keypoint_num + keypoint_num) 39 keypoints_with_score_and_id.append(keypoint_with_score_and_id) # 當前點的x、y坐標,當前點熱圖值,當前點在所有特征點中的index 40 keypoint_num += 1 # 特征點數量+1 41 all_keypoints.append(keypoints_with_score_and_id) # 將當前熱圖上檢測到的所有關節點添加到所有關節點中 42 return keypoint_num # 返回總共特征點的數量 43 44 45 def group_keypoints(all_keypoints_by_type, pafs, pose_entry_size=20, min_paf_score=0.05, demo=False): 46 pose_entries = [] 47 all_keypoints = np.array([item for sublist in all_keypoints_by_type for item in sublist]) # 將所有關節點展開成N*4的array 48 for part_id in range(len(BODY_PARTS_PAF_IDS)): # 將軀干某個連接的單位向量映射到paf對應的通道 49 part_pafs = pafs[:, :, BODY_PARTS_PAF_IDS[part_id]] # 得到當前軀干的2維單位向量(xy) 50 kpts_a = all_keypoints_by_type[BODY_PARTS_KPT_IDS[part_id][0]] # 當前軀干所有起點 BODY_PARTS_KPT_IDS為將關節點連接成軀干的映射 51 kpts_b = all_keypoints_by_type[BODY_PARTS_KPT_IDS[part_id][1]] # 當前軀干所有終點 kpts_a和kpts_b為[],里面可能有幾個4維向量,也可能為空 52 num_kpts_a = len(kpts_a) # 起點個數 53 num_kpts_b = len(kpts_b) # 終點個數 54 kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0] # 當前軀干起點的id 55 kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1] # 當前軀干終點的id 56 57 if num_kpts_a == 0 and num_kpts_b == 0: # no keypoints for such body part # 當前軀干無關節點 58 continue 59 elif num_kpts_a == 0: # body part has just 'b' keypoints 當前軀干只有終點的關節點 60 for i in range(num_kpts_b): # 依次遍歷所有終點 61 num = 0 62 for j in range(len(pose_entries)): # check if already in some pose, was added by another body part 和已經分配的所有人依次比較 63 if pose_entries[j][kpt_b_id] == kpts_b[i][3]: # 如果當前終點已經分配給了某個人 64 num += 1 # 數量+1 65 continue # 退出此處for j的循環 66 if num == 0: # 當前終點未分配給任何人,則新建一個人 67 pose_entry = np.ones(pose_entry_size) * -1 68 pose_entry[kpt_b_id] = kpts_b[i][3] # keypoint idx 69 pose_entry[-1] = 1 # num keypoints in pose 70 pose_entry[-2] = kpts_b[i][2] # pose score 71 pose_entries.append(pose_entry) 72 continue 73 elif num_kpts_b == 0: # body part has just 'a' keypoints 當前軀干只有起點的關節點 74 for i in range(num_kpts_a): # 依次遍歷所有起點 75 num = 0 76 for j in range(len(pose_entries)): # 和分配的所有人依次比較 77 if pose_entries[j][kpt_a_id] == kpts_a[i][3]: # 如果當前起點已經分配給了某個人 78 num += 1 # 數量+1 79 continue # 退出此處for j的循環 80 if num == 0: # 當前起點未分配給任何人,則新建一個人 81 pose_entry = np.ones(pose_entry_size) * -1 82 pose_entry[kpt_a_id] = kpts_a[i][3] 83 pose_entry[-1] = 1 84 pose_entry[-2] = kpts_a[i][2] 85 pose_entries.append(pose_entry) 86 continue 87 88 connections = [] # 軀干的連接 # 當前軀干起點和終點都有關節點 89 for i in range(num_kpts_a): # 依次遍歷起點的每個關節點 90 kpt_a = np.array(kpts_a[i][0:2]) # 起點當前關節點的坐標 91 for j in range(num_kpts_b): # 依次遍歷終點的每個關節點 92 kpt_b = np.array(kpts_b[j][0:2]) # 終點當前關節點的坐標 93 mid_point = [(), ()] 94 mid_point[0] = (int(round((kpt_a[0] + kpt_b[0]) * 0.5)), int(round((kpt_a[1] + kpt_b[1]) * 0.5))) 95 mid_point[1] = mid_point[0] # 起點和終點的中點 96 97 vec = [kpt_b[0] - kpt_a[0], kpt_b[1] - kpt_a[1]] # 起點指向終點的單位向量 98 vec_norm = math.sqrt(vec[0] ** 2 + vec[1] ** 2) 99 if vec_norm == 0: 100 continue 101 vec[0] /= vec_norm 102 vec[1] /= vec_norm 103 cur_point_score = (vec[0] * part_pafs[mid_point[0][1], mid_point[0][0], 0] + # part_pafs第0維為y索引,第1維為x索引,第2維為paf單位 104 vec[1] * part_pafs[mid_point[1][1], mid_point[1][0], 1]) # 向量的x或者y索引,此處為nx*x+ny*y,即paf在單位向量上的投影長度 105 106 height_n = pafs.shape[0] // 2 107 success_ratio = 0 108 point_num = 10 # number of points to integration over paf # paf上兩點之間抽10個點,累計paf 109 if cur_point_score > -100: 110 passed_point_score = 0 111 passed_point_num = 0 112 x, y = linspace2d(kpt_a, kpt_b) # 起點和終點之間插值,得到point_num個點 113 for point_idx in range(point_num): 114 if not demo: 115 px = int(round(x[point_idx])) # 四舍五入坐標 116 py = int(round(y[point_idx])) 117 else: 118 px = int(x[point_idx]) # 截斷坐標 119 py = int(y[point_idx]) 120 paf = part_pafs[py, px, 0:2] # 得到起點和終點中間抽點處paf的xy向量 121 cur_point_score = vec[0] * paf[0] + vec[1] * paf[1] # 該向量在起點指向終點單位向量上的投影 122 if cur_point_score > min_paf_score: # 投影大於閾值 123 passed_point_score += cur_point_score # 累計插值點score 124 passed_point_num += 1 # 累計插值點數量 125 success_ratio = passed_point_num / point_num # 插值點中大於閾值的點的數量占總插值點數量的比例 126 ratio = 0 127 if passed_point_num > 0: 128 ratio = passed_point_score / passed_point_num # 累計paf的平均值 129 ratio += min(height_n / vec_norm - 1, 0) # 兩特征點距離較遠,則懲罰paf平均值(較遠左側小於0) 130 if ratio > 0 and success_ratio > 0.8: # 累計paf平均值大於0,且兩關節點之間插值的點大於閾值的點的比例大於閾值 131 score_all = ratio + kpts_a[i][2] + kpts_b[j][2] # paf+起點熱圖+終點熱圖,作為當前起點和終點是一個軀干的score 132 connections.append([i, j, ratio, score_all]) # 當前起點和終點是一個軀干時起點在該關節點所有起點中的索引,終點在該關節點中所有終點的索引,paf均值,是一個軀干的得分 133 if len(connections) > 0: 134 connections = sorted(connections, key=itemgetter(2), reverse=True) # 按照paf均值排序 135 136 num_connections = min(num_kpts_a, num_kpts_b) # 當前圖像上該軀干最多的數量(起點和終點較少值) 137 has_kpt_a = np.zeros(num_kpts_a, dtype=np.int32) # 起點被占用的flag 138 has_kpt_b = np.zeros(num_kpts_b, dtype=np.int32) # 終點被占用的flag 139 filtered_connections = [] # 清理之后的connections:當前軀干起點在所有關節點中的索引,終點在所有關節點中的索引,paf均值 140 for row in range(len(connections)): 141 if len(filtered_connections) == num_connections: # 已經達到最多關節點數量了,不用繼續比較了 142 break 143 i, j, cur_point_score = connections[row][0:3] # 當前起點和終點是一個軀干時起點在該關節點所有起點中的索引,終點在該關節點中所有終點的索引,paf均值 144 if not has_kpt_a[i] and not has_kpt_b[j]: # 起點和終點均未被占用(如果i某個起點或者某個終點被分配給了不同的軀干,因paf從大到小排序,故paf較小的忽略) 145 filtered_connections.append([kpts_a[i][3], kpts_b[j][3], cur_point_score]) # 當前軀干起點在所有關節點中的索引,終點在所有關節點中的索引,paf均值 146 has_kpt_a[i] = 1 # 對應起點被占用 147 has_kpt_b[j] = 1 # 對應終點被占用 148 connections = filtered_connections # 使用清理之后的connections,實際上score_all未使用 149 if len(connections) == 0: # 當前無軀干,計算下一個軀干 150 continue 151 152 if part_id == 0: # 第一次計算軀干 153 pose_entries = [np.ones(pose_entry_size) * -1 for _ in range(len(connections))] # 前18個為每個人各個關節點在所有關節點中的索引,最后兩個分別為總分值和分配給這個人關節點的數量 154 for i in range(len(connections)): # 依次遍歷當前找到的所有該軀干 155 pose_entries[i][BODY_PARTS_KPT_IDS[0][0]] = connections[i][0] # 起點在所有關節點中的索引 156 pose_entries[i][BODY_PARTS_KPT_IDS[0][1]] = connections[i][1] # 終點在所有關節點中的索引 157 pose_entries[i][-1] = 2 # 當前人所有關節點的數量 158 pose_entries[i][-2] = np.sum(all_keypoints[connections[i][0:2], 2]) + connections[i][2] # 兩個關節點熱圖值+平均paf值 159 elif part_id == 17 or part_id == 18: # 最后兩個軀干 160 kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0] # 起點的id 161 kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1] # 終點的id 162 for i in range(len(connections)): # 將當前軀干和part_id=0時分配的所有人依次比較。此處為當前軀干 163 for j in range(len(pose_entries)): # 此處為分配的所有人 164 if pose_entries[j][kpt_a_id] == connections[i][0] and pose_entries[j][kpt_b_id] == -1: # 當前軀干的起點和分配到的某個人的起點一致,且當前軀干的終點未分配 165 pose_entries[j][kpt_b_id] = connections[i][1] # 將當前軀干的終點分配到這個人對應終點上 166 elif pose_entries[j][kpt_b_id] == connections[i][1] and pose_entries[j][kpt_a_id] == -1: # 當前軀干的終點和分配到的某個人的終點一致,且當前軀干的起點未分配 167 pose_entries[j][kpt_a_id] = connections[i][0] # 將當前軀干的起點分配到這個人對應起點上 168 continue 169 else: 170 kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0] # 起點的id 171 kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1] # 終點的id 172 for i in range(len(connections)): # 將當前軀干和part_id=0時分配的所有人依次比較。此處為當前軀干 173 num = 0 174 for j in range(len(pose_entries)): # 此處為分配的所有人 175 if pose_entries[j][kpt_a_id] == connections[i][0]: # 當前軀干的起點和分配到的某個人的起點一致 176 pose_entries[j][kpt_b_id] = connections[i][1] # 將當前軀干的終點分配到這個人對應終點上 177 num += 1 # 分配的人+1 178 pose_entries[j][-1] += 1 # 當前人所有關節點的數量+1 179 pose_entries[j][-2] += all_keypoints[connections[i][1], 2] + connections[i][2] # 當前人socre增加 180 if num == 0: # 如果沒有分配到的人,則新建一個人 181 pose_entry = np.ones(pose_entry_size) * -1 182 pose_entry[kpt_a_id] = connections[i][0] 183 pose_entry[kpt_b_id] = connections[i][1] 184 pose_entry[-1] = 2 185 pose_entry[-2] = np.sum(all_keypoints[connections[i][0:2], 2]) + connections[i][2] 186 pose_entries.append(pose_entry) 187 188 filtered_entries = [] 189 for i in range(len(pose_entries)): # 依次遍歷所有分配的人 190 if pose_entries[i][-1] < 3 or (pose_entries[i][-2] / pose_entries[i][-1] < 0.2): # 如果當前人關節點數量少於3,或者當前人平均得分小於0.2,則刪除該人 191 continue 192 filtered_entries.append(pose_entries[i]) 193 pose_entries = np.asarray(filtered_entries) 194 return pose_entries, all_keypoints # 返回所有分配的人(前18維為每個人各個關節點在所有關節點中的索引,后兩唯為每個人得分及每個人關節點數量),及所有關節點信息
4.11 demo
demo中兩個函數代碼如下:

1 def infer_fast(net, img, net_input_height_size, stride, upsample_ratio, cpu, 2 pad_value=(0, 0, 0), img_mean=(128, 128, 128), img_scale=1/256): 3 height, width, _ = img.shape # 實際高寬 4 scale = net_input_height_size / height # 將實際高所放到期望高的縮放倍數 5 6 scaled_img = cv2.resize(img, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC) # 縮放后的圖像 7 scaled_img = normalize(scaled_img, img_mean, img_scale) # 歸一化圖像 8 min_dims = [net_input_height_size, max(scaled_img.shape[1], net_input_height_size)] 9 padded_img, pad = pad_width(scaled_img, stride, pad_value, min_dims) # 填充到高寬為stride整數倍的值 10 11 tensor_img = torch.from_numpy(padded_img).permute(2, 0, 1).unsqueeze(0).float() # 由HWC轉成CHW(BGR格式) 12 if not cpu: 13 tensor_img = tensor_img.cuda() 14 15 stages_output = net(tensor_img) # 得到網絡的輸出 16 17 stage2_heatmaps = stages_output[-2] # 最后一個stage的熱圖 18 heatmaps = np.transpose(stage2_heatmaps.squeeze().cpu().data.numpy(), (1, 2, 0)) # 最后一個stage的熱圖作為最終的熱圖 19 heatmaps = cv2.resize(heatmaps, (0, 0), fx=upsample_ratio, fy=upsample_ratio, interpolation=cv2.INTER_CUBIC) # 熱圖放大upsample_ratio倍 20 21 stage2_pafs = stages_output[-1] # 最后一個stage的paf 22 pafs = np.transpose(stage2_pafs.squeeze().cpu().data.numpy(), (1, 2, 0)) # 最后一個stage的paf作為最終的paf 23 pafs = cv2.resize(pafs, (0, 0), fx=upsample_ratio, fy=upsample_ratio, interpolation=cv2.INTER_CUBIC) # paf放大upsample_ratio倍 24 25 return heatmaps, pafs, scale, pad # 返回熱圖,paf,輸入模型圖像相比原始圖像縮放倍數,輸入模型圖像padding尺寸 26 27 28 def run_demo(net, image_provider, height_size, cpu): 29 net = net.eval() 30 if not cpu: 31 net = net.cuda() 32 33 stride = 8 34 upsample_ratio = 4 35 color = [0, 224, 255] 36 for img in image_provider: 37 orig_img = img.copy() 38 heatmaps, pafs, scale, pad = infer_fast(net, img, height_size, stride, upsample_ratio, cpu) # 熱圖,paf,輸入模型圖像相比原始圖像縮放倍數,輸入模型圖像padding尺寸 39 40 total_keypoints_num = 0 41 all_keypoints_by_type = [] # all_keypoints_by_type為18個list,每個list包含Ni個當前點的x、y坐標,當前點熱圖值,當前點在所有特征點中的index 42 for kpt_idx in range(18): # 19th for bg 第19個為背景,之考慮前18個關節點 43 total_keypoints_num += extract_keypoints(heatmaps[:, :, kpt_idx], all_keypoints_by_type, total_keypoints_num) 44 45 pose_entries, all_keypoints = group_keypoints(all_keypoints_by_type, pafs, demo=True) # 得到所有分配的人(前18維為每個人各個關節點在所有關節點中的索引,后兩唯為每個人得分及每個人關節點數量),及所有關節點信息 46 for kpt_id in range(all_keypoints.shape[0]): # 依次將每個關節點信息縮放回原始圖像上 47 all_keypoints[kpt_id, 0] = (all_keypoints[kpt_id, 0] * stride / upsample_ratio - pad[1]) / scale 48 all_keypoints[kpt_id, 1] = (all_keypoints[kpt_id, 1] * stride / upsample_ratio - pad[0]) / scale 49 for n in range(len(pose_entries)): # 依次遍歷找到的每個人 50 if len(pose_entries[n]) == 0: 51 continue 52 for part_id in range(len(BODY_PARTS_PAF_IDS) - 2): # 將軀干某個連接的單位向量映射到paf對應的通道 53 kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0] # 當前軀干起點的id 54 global_kpt_a_id = pose_entries[n][kpt_a_id] # 當前關節點在所有關節點中的索引 55 if global_kpt_a_id != -1: # 分配了當前關節點 56 x_a, y_a = all_keypoints[int(global_kpt_a_id), 0:2] # 當前關節點在原圖像上的坐標 57 cv2.circle(img, (int(x_a), int(y_a)), 3, color, -1) # 原圖畫圓 58 kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1] # 當前軀干終點的id 59 global_kpt_b_id = pose_entries[n][kpt_b_id] # 當前關節點在所有關節點中的索引 60 if global_kpt_b_id != -1: # 分配了當前關節點 61 x_b, y_b = all_keypoints[int(global_kpt_b_id), 0:2] # 當前關節點在原圖像上的坐標 62 cv2.circle(img, (int(x_b), int(y_b)), 3, color, -1) # 原圖畫圓 63 if global_kpt_a_id != -1 and global_kpt_b_id != -1: # 起點和終點均分配 64 cv2.line(img, (int(x_a), int(y_a)), (int(x_b), int(y_b)), color, 2) # 畫連接起點和終點的直線 65 66 img = cv2.addWeighted(orig_img, 0.6, img, 0.4, 0) # 0.6 * orig_img + 0.4 * img 67 cv2.imwrite('res.jpg', img)
4.12 左右鏡像
此處的左右鏡像,指測試階段的左右鏡像。不要和4.7.5中訓練階段的Flip弄混。由於在測試階段,已經得到了關鍵點和paf,因而若左右鏡像圖像,需要將heatmaps及pafs進行重新映射,如下表所示。另一方面,需要將paf的x坐標取負,因為paf是從起點指向終點的向量。左右鏡像后,起點指向終點的向量的y分量不變,但是x分量則相反。