(原)人體姿態識別Light weight openpose


轉載請注明出處:

https://www.cnblogs.com/darkknightzh/p/12152119.html

論文:

https://arxiv.org/abs/1811.12004

官方pytorch代碼:

https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch

1 簡介

light weight openpose是openpose的簡化版本,使用了openpose的大體流程。

Light weight openpose和openpose的區別是:

a 前者使用的是Mobilenet V1(到conv5_5),后者使用的是Vgg19(前10層)。

b 前者部分層使用了空洞卷積(dilated convolution)來提升感受視野,后者使用一般的卷積。

c 前者卷積核大小為3*3,后者為7*7。

d 前者只有一個refine stage,后者有5個stage。

e 前者的initial stage和refine stage里面的兩個分支(hotmaps和pafs)使用權值共享,后者則是並行的兩個分支。

2 改進

2.1 骨干網絡

論文中分析了openpose各階段的mAP及GFLOPs

發現從refine stage1之后,性能的提升不是非常明顯,但是GFLOPs增加的相當明顯,因而只保留了refine stage1,后面的都刪除了。

2.2 權值共享

openpose的每個stage使用下圖中左側的兩個並行的分支,分別預測hotmaps和pafs,為了進一步降低計算量,light weight openpose中將前幾層進行權值共享,如下圖右側所示。

 

2.3 空洞卷積

進一步的,light weight openpose使用含有空洞卷積的mobilenet v1替換掉了vgg10,GFLOPs進一步降低了很多,如下圖所示(下圖中2-stage network中的那個n/a,是指使用所有的refine stage進行訓練,但是使用的時候,只到refine stage 1,這樣測試時的計算量不變,后幾個階段無計算量,因而為n/a,同時最后一欄GFLOPs還是9)。

2.4 3*3 卷積

為了和vgg19有相同的感受視野,light weight openpose中使用下面的卷積塊來替代vgg19中的7*7卷積(具體的感受視野怎么計算的,不太清楚了。。。)。該圖對應代碼中的RefinementStageBlock。

3 訓練過程

分三個階段(不要和initial stage、refine stage弄混了)

a 使用MobileNet V1預訓練的模型訓練1個stage(initial stage + stage 1)的light weight openpose。此階段mAP大約在38%。

b 使用a的結果繼續訓練light weight openpose。此階段mAP大約在39%。

c 使用b的結果,將stage設置為3(initial stage + stage 1+ stage 2+ stage 3),繼續訓練light weight openpose;但是測試時,只使用stage=1時的結果估計姿態。此階段mAP大約在40%。

注意:

a每次訓練時,直接使用上次訓練得到的最后一個模型重新訓練,同時沒有改學習率等參數。

b每個階段驗證時,為了節約時間,可以只在在驗證集的子集上驗證(和在整個驗證集上性能差距很小)。

4 代碼

4.1 整體網絡結構

主要網絡代碼如下:

 1 class PoseEstimationWithMobileNet(nn.Module):
 2     def __init__(self, num_refinement_stages=1, num_channels=128, num_heatmaps=19, num_pafs=38):
 3         super().__init__()
 4         self.model = nn.Sequential(                     # mobilenet V1的骨干網絡
 5             conv(     3,  32, stride=2, bias=False),    # conv+BN+ReLU
 6             conv_dw( 32,  64),                          # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
 7             conv_dw( 64, 128, stride=2),                # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
 8             conv_dw(128, 128),                          # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
 9             conv_dw(128, 256, stride=2),                # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
10             conv_dw(256, 256),                          # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
11             conv_dw(256, 512),         # conv4_2        # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
12             conv_dw(512, 512, dilation=2, padding=2),   # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
13             conv_dw(512, 512),                          # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
14             conv_dw(512, 512),                          # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
15             conv_dw(512, 512),                          # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
16             conv_dw(512, 512)   # conv5_5               # dw_conv(in,in, stride)+BN+ReLU + conv(in,out)+BN+ReLU
17         )
18         self.cpm = Cpm(512, num_channels)               # 降維模塊
19 
20         self.initial_stage = InitialStage(num_channels, num_heatmaps, num_pafs)  # 初始階段
21         self.refinement_stages = nn.ModuleList()
22         for idx in range(num_refinement_stages):
23             self.refinement_stages.append(RefinementStage(num_channels + num_heatmaps + num_pafs, num_channels, num_heatmaps, num_pafs))  # refine階段
24 
25     def forward(self, x):
26         backbone_features = self.model(x)
27         backbone_features = self.cpm(backbone_features)
28 
29         stages_output = self.initial_stage(backbone_features)
30         for refinement_stage in self.refinement_stages:
31             stages_output.extend(refinement_stage(torch.cat([backbone_features, stages_output[-2], stages_output[-1]], dim=1)))
32 
33         return stages_output
34 
35 由於mobilenet V1輸出為512維,有一個cpm的降維層,降維到128維,如下:
36 class Cpm(nn.Module):
37     def __init__(self, in_channels, out_channels):
38         super().__init__()
39         self.align = conv(in_channels, out_channels, kernel_size=1, padding=0, bn=False)  # conv+ReLU
40         self.trunk = nn.Sequential(
41             conv_dw_no_bn(out_channels, out_channels),                                    # dw_conv(in,in)+ELU + conv(in,out)+ELU
42             conv_dw_no_bn(out_channels, out_channels),                                    # dw_conv(in,in)+ELU + conv(in,out)+ELU
43             conv_dw_no_bn(out_channels, out_channels)                                     # dw_conv(in,in)+ELU + conv(in,out)+ELU
44         )
45         self.conv = conv(out_channels, out_channels, bn=False)                            # conv+ReLU
46 
47     def forward(self, x):
48         x = self.align(x)
49         x = self.conv(x + self.trunk(x))
50         return x
View Code

4.2 initial stage

 1 class InitialStage(nn.Module):
 2     def __init__(self, num_channels, num_heatmaps, num_pafs):
 3         super().__init__()
 4         self.trunk = nn.Sequential(                                                     # 權值共享
 5             conv(num_channels, num_channels, bn=False),                                 # conv+ReLU
 6             conv(num_channels, num_channels, bn=False),                                 # conv+ReLU
 7             conv(num_channels, num_channels, bn=False)                                  # conv+ReLU
 8         )
 9         self.heatmaps = nn.Sequential(                                                  # heatmaps
10             conv(num_channels, 512, kernel_size=1, padding=0, bn=False),                # 1*1conv+ReLU
11             conv(512, num_heatmaps, kernel_size=1, padding=0, bn=False, relu=False)     # 1*1conv
12         )
13         self.pafs = nn.Sequential(                                                      # pafs
14             conv(num_channels, 512, kernel_size=1, padding=0, bn=False),                # 1*1conv+ReLU
15             conv(512, num_pafs, kernel_size=1, padding=0, bn=False, relu=False)         # 1*1conv
16         )
17 
18     def forward(self, x):
19         trunk_features = self.trunk(x)
20         heatmaps = self.heatmaps(trunk_features)
21         pafs = self.pafs(trunk_features)
22         return [heatmaps, pafs]
View Code

4.3 refine stage

refine stage包括5個相同的RefinementStageBlock,用於權值共享。每個RefinementStageBlock如2.4所示。

 1 class RefinementStageBlock(nn.Module):
 2     def __init__(self, in_channels, out_channels):
 3         super().__init__()
 4         self.initial = conv(in_channels, out_channels, kernel_size=1, padding=0, bn=False)  # 1*1conv+ReLU
 5         self.trunk = nn.Sequential(
 6             conv(out_channels, out_channels),                                               # conv+BN+ReLU
 7             conv(out_channels, out_channels, dilation=2, padding=2)                         # conv+BN+ReLU
 8         )
 9 
10     def forward(self, x):
11         initial_features = self.initial(x)
12         trunk_features = self.trunk(initial_features)
13         return initial_features + trunk_features                                            # 論文中2個3*3conv代替7*7conv
14 
15 
16 class RefinementStage(nn.Module):
17     def __init__(self, in_channels, out_channels, num_heatmaps, num_pafs):
18         super().__init__()
19         self.trunk = nn.Sequential(                                                            # 權值共享
20             RefinementStageBlock(in_channels, out_channels),
21             RefinementStageBlock(out_channels, out_channels),
22             RefinementStageBlock(out_channels, out_channels),
23             RefinementStageBlock(out_channels, out_channels),
24             RefinementStageBlock(out_channels, out_channels)
25         )
26         self.heatmaps = nn.Sequential(                                                         # heatmaps
27             conv(out_channels, out_channels, kernel_size=1, padding=0, bn=False),              # 1*1conv+ReLU
28             conv(out_channels, num_heatmaps, kernel_size=1, padding=0, bn=False, relu=False)   # 1*1conv
29         )
30         self.pafs = nn.Sequential(                                                             # pafs
31             conv(out_channels, out_channels, kernel_size=1, padding=0, bn=False),              # 1*1conv+ReLU
32             conv(out_channels, num_pafs, kernel_size=1, padding=0, bn=False, relu=False)       # 1*1conv
33         )
34 
35     def forward(self, x):
36         trunk_features = self.trunk(x)
37         heatmaps = self.heatmaps(trunk_features)
38         pafs = self.pafs(trunk_features)
39         return [heatmaps, pafs]
View Code

4.4 各種自定義的conv

上面網絡中使用的conv結構如下:

 1 def conv(in_channels, out_channels, kernel_size=3, padding=1, bn=True, dilation=1, stride=1, relu=True, bias=True):
 2     modules = [nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, bias=bias)]
 3     if bn:
 4         modules.append(nn.BatchNorm2d(out_channels))
 5     if relu:
 6         modules.append(nn.ReLU(inplace=True))
 7     return nn.Sequential(*modules)
 8 
 9 
10 def conv_dw(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
11     return nn.Sequential(
12         nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
13         nn.BatchNorm2d(in_channels),
14         nn.ReLU(inplace=True),
15 
16         nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
17         nn.BatchNorm2d(out_channels),
18         nn.ReLU(inplace=True),
19     )
20 
21 
22 def conv_dw_no_bn(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
23     return nn.Sequential(
24         nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
25         nn.ELU(inplace=True),
26 
27         nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
28         nn.ELU(inplace=True),
29     )
View Code

ELU激活函數如下:

4.5 損失函數

網絡的損失函數如下,由於COCO數據庫對某些很小的人沒有標注,將這些地方的mask設置為0,防止這些人對訓練造成干擾。

1 def l2_loss(input, target, mask, batch_size):
2     loss = (input - target) * mask
3     loss = (loss * loss) / 2 / batch_size
4 
5     return loss.sum()
View Code

如下圖a為圖像,b為mask_miss。COCO中把遠處的人標注了,但是沒有標注關節點信息,為了防止這些人干擾訓練,因而才有了mask_miss。所有人的mask減去mask_miss,就是上面的mask了。

 

 

 (a)

 

 

 (b)

4.6 train

train用到了ConvertKeypoints,Scale Rotate,CropPad,Flip等變換,見4.7.

  1 def train(prepared_train_labels, train_images_folder, num_refinement_stages, base_lr, batch_size, batches_per_iter,
  2           num_workers, checkpoint_path, weights_only, from_mobilenet, checkpoints_folder, log_after,
  3           val_labels, val_images_folder, val_output_name, checkpoint_after, val_after):
  4     net = PoseEstimationWithMobileNet(num_refinement_stages)
  5 
  6     stride = 8  # 輸入圖像是特征圖的倍數
  7     sigma = 7  # 生成關節點heatmaps時,高斯核的標准差
  8     path_thickness = 1  # 生成paf時軀干的寬度
  9     dataset = CocoTrainDataset(prepared_train_labels, train_images_folder,
 10                                stride, sigma, path_thickness,
 11                                transform=transforms.Compose([
 12                                    ConvertKeypoints(),
 13                                    Scale(),
 14                                    Rotate(pad=(128, 128, 128)),
 15                                    CropPad(pad=(128, 128, 128)),
 16                                    Flip()]))
 17     train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers)
 18 
 19     optimizer = optim.Adam([
 20         {'params': get_parameters_conv(net.model, 'weight')},
 21         {'params': get_parameters_conv_depthwise(net.model, 'weight'), 'weight_decay': 0},
 22         {'params': get_parameters_bn(net.model, 'weight'), 'weight_decay': 0},
 23         {'params': get_parameters_bn(net.model, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0},
 24         {'params': get_parameters_conv(net.cpm, 'weight'), 'lr': base_lr},
 25         {'params': get_parameters_conv(net.cpm, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0},
 26         {'params': get_parameters_conv_depthwise(net.cpm, 'weight'), 'weight_decay': 0},
 27         {'params': get_parameters_conv(net.initial_stage, 'weight'), 'lr': base_lr},
 28         {'params': get_parameters_conv(net.initial_stage, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0},
 29         {'params': get_parameters_conv(net.refinement_stages, 'weight'), 'lr': base_lr * 4},
 30         {'params': get_parameters_conv(net.refinement_stages, 'bias'), 'lr': base_lr * 8, 'weight_decay': 0},
 31         {'params': get_parameters_bn(net.refinement_stages, 'weight'), 'weight_decay': 0},
 32         {'params': get_parameters_bn(net.refinement_stages, 'bias'), 'lr': base_lr * 2, 'weight_decay': 0},
 33     ], lr=base_lr, weight_decay=5e-4)
 34 
 35     num_iter = 0
 36     current_epoch = 0
 37     drop_after_epoch = [100, 200, 260]
 38     scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=drop_after_epoch, gamma=0.333)
 39     if checkpoint_path:
 40         checkpoint = torch.load(checkpoint_path)
 41         if from_mobilenet:
 42             load_from_mobilenet(net, checkpoint)
 43         else:
 44             load_state(net, checkpoint)
 45             if not weights_only:
 46                 optimizer.load_state_dict(checkpoint['optimizer'])
 47                 scheduler.load_state_dict(checkpoint['scheduler'])
 48                 num_iter = checkpoint['iter']
 49                 current_epoch = checkpoint['current_epoch']
 50 
 51     net = DataParallel(net).cuda()
 52     net.train()
 53     for epochId in range(current_epoch, 280):
 54         scheduler.step()
 55         total_losses = [0, 0] * (num_refinement_stages + 1)  # heatmaps loss, paf loss per stage(initial stage + refine stage)
 56         batch_per_iter_idx = 0
 57         for batch_data in train_loader:
 58             if batch_per_iter_idx == 0:
 59                 optimizer.zero_grad()
 60 
 61             images = batch_data['image'].cuda()
 62             keypoint_masks = batch_data['keypoint_mask'].cuda()
 63             paf_masks = batch_data['paf_mask'].cuda()
 64             keypoint_maps = batch_data['keypoint_maps'].cuda()
 65             paf_maps = batch_data['paf_maps'].cuda()
 66 
 67             stages_output = net(images)
 68 
 69             losses = []
 70             for loss_idx in range(len(total_losses) // 2):
 71                 losses.append(l2_loss(stages_output[loss_idx * 2], keypoint_maps, keypoint_masks, images.shape[0]))  # 2i維為熱圖
 72                 losses.append(l2_loss(stages_output[loss_idx * 2 + 1], paf_maps, paf_masks, images.shape[0]))   # 2i+1維為paf
 73                 total_losses[loss_idx * 2] += losses[-2].item() / batches_per_iter  # 累積loss
 74                 total_losses[loss_idx * 2 + 1] += losses[-1].item() / batches_per_iter  # 累積loss
 75 
 76             loss = losses[0]
 77             for loss_idx in range(1, len(losses)):
 78                 loss += losses[loss_idx]  # 計算所有stage的loss
 79             loss /= batches_per_iter  # loss平均
 80             loss.backward()
 81             batch_per_iter_idx += 1
 82             if batch_per_iter_idx == batches_per_iter:
 83                 optimizer.step()
 84                 batch_per_iter_idx = 0
 85                 num_iter += 1
 86             else:
 87                 continue
 88 
 89             if num_iter % log_after == 0:
 90                 print('Iter: {}'.format(num_iter))
 91                 for loss_idx in range(len(total_losses) // 2):
 92                     print('\n'.join(['stage{}_pafs_loss:     {}', 'stage{}_heatmaps_loss: {}']).format(
 93                         loss_idx + 1, total_losses[loss_idx * 2 + 1] / log_after, loss_idx + 1, total_losses[loss_idx * 2] / log_after))
 94                 for loss_idx in range(len(total_losses)):
 95                     total_losses[loss_idx] = 0
 96             if num_iter % checkpoint_after == 0:
 97                 snapshot_name = '{}/checkpoint_iter_{}.pth'.format(checkpoints_folder, num_iter)
 98                 torch.save({'state_dict': net.module.state_dict(),
 99                             'optimizer': optimizer.state_dict(),
100                             'scheduler': scheduler.state_dict(),
101                             'iter': num_iter,
102                             'current_epoch': epochId},
103                            snapshot_name)
104            # if num_iter % val_after == 0:
105                 #print('Validation...')
106                 #evaluate(val_labels, val_output_name, val_images_folder, net)
107                 #net.train()
View Code

4.7 transformations

transformations主要包括ConvertKeypoints,Scale Rotate,CropPad,Flip等變換。

4.7.1 ConvertKeypoints

ConvertKeypoints用於將coco的關鍵點順序變換到代碼中的關鍵點順序。

 1 class ConvertKeypoints(object):
 2     def __call__(self, sample):
 3         label = sample['label']
 4         h, w, _ = sample['image'].shape
 5         keypoints = label['keypoints']  # keypoint[2]=0: 遮擋  1:可見  2:不在圖像內
 6         for keypoint in keypoints:  # keypoint[2] == 0: occluded, == 1: visible, == 2: not in image
 7             if keypoint[0] == keypoint[1] == 0:
 8                 keypoint[2] = 2
 9             if (keypoint[0] < 0 or keypoint[0] >= w or keypoint[1] < 0 or keypoint[1] >= h):
10                 keypoint[2] = 2
11         for other_label in label['processed_other_annotations']:
12             keypoints = other_label['keypoints']
13             for keypoint in keypoints:
14                 if keypoint[0] == keypoint[1] == 0:
15                     keypoint[2] = 2
16                 if (keypoint[0] < 0 or keypoint[0] >= w or keypoint[1] < 0 or keypoint[1] >= h):
17                     keypoint[2] = 2
18         label['keypoints'] = self._convert(label['keypoints'], w, h)  # 變成文中關節點的順序,同時增加脖子
19 
20         for other_label in label['processed_other_annotations']:
21             other_label['keypoints'] = self._convert(other_label['keypoints'], w, h)  # 變成文中關節點的順序,同時增加脖子
22         return sample
23 
24     def _convert(self, keypoints, w, h):
25         # Nose, Neck, R hand, L hand, R leg, L leg, Eyes, Ears
26         reorder_map = [1, 7, 9, 11, 6, 8, 10, 13, 15, 17, 12, 14, 16, 3, 2, 5, 4]  # COCO關節點到文中關節點的映射
27         converted_keypoints = list(keypoints[i - 1] for i in reorder_map)  # 映射到文中的關節點順序
28         # Add neck as a mean of shoulders
29         converted_keypoints.insert(1, [(keypoints[5][0] + keypoints[6][0]) / 2, (keypoints[5][1] + keypoints[6][1]) / 2, 0])  # 增加脖子
30         if keypoints[5][2] == 2 and keypoints[6][2] == 2:
31             converted_keypoints[1][2] = 2
32         elif keypoints[5][2] == 3 and keypoints[6][2] == 3:
33             converted_keypoints[1][2] = 3
34         elif keypoints[5][2] == 1 and keypoints[6][2] == 1:
35             converted_keypoints[1][2] = 1
36         if (converted_keypoints[1][0] < 0 or converted_keypoints[1][0] >= w or converted_keypoints[1][1] < 0 or converted_keypoints[1][1] >= h):
37             converted_keypoints[1][2] = 2
38         return converted_keypoints
View Code
其中coco和代碼中的關鍵點順序分別如下圖所示,通過reorder_map中的值-1變換,並插入neck。

4.7.2 Scale

Scale用於縮放圖像及關鍵點信息。

 1 class Scale(object):
 2     def __init__(self, prob=1, min_scale=0.5, max_scale=1.1, target_dist=0.6):
 3         self._prob = prob
 4         self._min_scale = min_scale
 5         self._max_scale = max_scale
 6         self._target_dist = target_dist
 7 
 8     def __call__(self, sample):
 9         prob = random.random()
10         scale_multiplier = 1
11         if prob <= self._prob:
12             prob = random.random()
13             scale_multiplier = (self._max_scale - self._min_scale) * prob + self._min_scale
14         label = sample['label']
15         scale_abs = self._target_dist / label['scale_provided']
16         scale = scale_abs * scale_multiplier
17         sample['image'] = cv2.resize(sample['image'], dsize=(0, 0), fx=scale, fy=scale)
18         label['img_height'], label['img_width'], _ = sample['image'].shape
19         sample['mask'] = cv2.resize(sample['mask'], dsize=(0, 0), fx=scale, fy=scale)
20 
21         label['objpos'][0] *= scale
22         label['objpos'][1] *= scale
23         for keypoint in sample['label']['keypoints']:
24             keypoint[0] *= scale
25             keypoint[1] *= scale
26         for other_annotation in sample['label']['processed_other_annotations']:
27             other_annotation['objpos'][0] *= scale
28             other_annotation['objpos'][1] *= scale
29             for keypoint in other_annotation['keypoints']:
30                 keypoint[0] *= scale
31                 keypoint[1] *= scale
32         return sample
View Code

4.7.3 Rotate

Rotate用於旋轉圖像及關鍵點信息。

 1 class Rotate(object):
 2     def __init__(self, pad, max_rotate_degree=40):
 3         self._pad = pad
 4         self._max_rotate_degree = max_rotate_degree
 5 
 6     def __call__(self, sample):
 7         prob = random.random()
 8         degree = (prob - 0.5) * 2 * self._max_rotate_degree
 9         h, w, _ = sample['image'].shape
10         img_center = (w / 2, h / 2)
11         R = cv2.getRotationMatrix2D(img_center, degree, 1)
12 
13         abs_cos = abs(R[0, 0])
14         abs_sin = abs(R[0, 1])
15 
16         bound_w = int(h * abs_sin + w * abs_cos)
17         bound_h = int(h * abs_cos + w * abs_sin)
18         dsize = (bound_w, bound_h)
19 
20         R[0, 2] += dsize[0] / 2 - img_center[0]
21         R[1, 2] += dsize[1] / 2 - img_center[1]
22         sample['image'] = cv2.warpAffine(sample['image'], R, dsize=dsize, borderMode=cv2.BORDER_CONSTANT, borderValue=self._pad)
23         sample['label']['img_height'], sample['label']['img_width'], _ = sample['image'].shape
24         sample['mask'] = cv2.warpAffine(sample['mask'], R, dsize=dsize, borderMode=cv2.BORDER_CONSTANT, borderValue=(1, 1, 1))  # border is ok
25         label = sample['label']
26         label['objpos'] = self._rotate(label['objpos'], R)  # 旋轉位置坐標
27         for keypoint in label['keypoints']:
28             point = [keypoint[0], keypoint[1]]
29             point = self._rotate(point, R)  # 旋轉位置坐標
30             keypoint[0], keypoint[1] = point[0], point[1]
31         for other_annotation in label['processed_other_annotations']:
32             for keypoint in other_annotation['keypoints']:
33                 point = [keypoint[0], keypoint[1]]
34                 point = self._rotate(point, R)  # 旋轉位置坐標
35                 keypoint[0], keypoint[1] = point[0], point[1]
36         return sample
37 
38     def _rotate(self, point, R):
39         return [R[0, 0] * point[0] + R[0, 1] * point[1] + R[0, 2], R[1, 0] * point[0] + R[1, 1] * point[1] + R[1, 2]]
View Code

4.7.4 CropPad

CropPad用於隨機裁剪

 1 class CropPad(object):
 2     def __init__(self, pad, center_perterb_max=40, crop_x=368, crop_y=368):
 3         self._pad = pad
 4         self._center_perterb_max = center_perterb_max
 5         self._crop_x = crop_x
 6         self._crop_y = crop_y
 7 
 8     def __call__(self, sample):
 9         prob_x = random.random()
10         prob_y = random.random()
11 
12         offset_x = int((prob_x - 0.5) * 2 * self._center_perterb_max)
13         offset_y = int((prob_y - 0.5) * 2 * self._center_perterb_max)
14         label = sample['label']
15         shifted_center = (label['objpos'][0] + offset_x, label['objpos'][1] + offset_y)
16         offset_left = -int(shifted_center[0] - self._crop_x / 2)
17         offset_up = -int(shifted_center[1] - self._crop_y / 2)
18 
19         cropped_image = np.empty(shape=(self._crop_y, self._crop_x, 3), dtype=np.uint8)
20         for i in range(3):
21             cropped_image[:, :, i].fill(self._pad[i])
22         cropped_mask = np.empty(shape=(self._crop_y, self._crop_x), dtype=np.uint8)
23         cropped_mask.fill(1)
24 
25         image_x_start = int(shifted_center[0] - self._crop_x / 2)
26         image_y_start = int(shifted_center[1] - self._crop_y / 2)
27         image_x_finish = image_x_start + self._crop_x
28         image_y_finish = image_y_start + self._crop_y
29         crop_x_start = 0
30         crop_y_start = 0
31         crop_x_finish = self._crop_x
32         crop_y_finish = self._crop_y
33 
34         w, h = label['img_width'], label['img_height']
35         should_crop = True
36         if image_x_start < 0:  # Adjust crop area
37             crop_x_start -= image_x_start
38             image_x_start = 0
39         if image_x_start >= w:
40             should_crop = False
41 
42         if image_y_start < 0:
43             crop_y_start -= image_y_start
44             image_y_start = 0
45         if image_y_start >= w:
46             should_crop = False
47 
48         if image_x_finish > w:
49             diff = image_x_finish - w
50             image_x_finish -= diff
51             crop_x_finish -= diff
52         if image_x_finish < 0:
53             should_crop = False
54 
55         if image_y_finish > h:
56             diff = image_y_finish - h
57             image_y_finish -= diff
58             crop_y_finish -= diff
59         if image_y_finish < 0:
60             should_crop = False
61 
62         if should_crop:
63             cropped_image[crop_y_start:crop_y_finish, crop_x_start:crop_x_finish, :] =\
64                 sample['image'][image_y_start:image_y_finish, image_x_start:image_x_finish, :]
65             cropped_mask[crop_y_start:crop_y_finish, crop_x_start:crop_x_finish] =\
66                 sample['mask'][image_y_start:image_y_finish, image_x_start:image_x_finish]
67 
68         sample['image'] = cropped_image
69         sample['mask'] = cropped_mask
70         label['img_width'] = self._crop_x
71         label['img_height'] = self._crop_y
72 
73         label['objpos'][0] += offset_left
74         label['objpos'][1] += offset_up
75         for keypoint in label['keypoints']:
76             keypoint[0] += offset_left
77             keypoint[1] += offset_up
78         for other_annotation in label['processed_other_annotations']:
79             for keypoint in other_annotation['keypoints']:
80                 keypoint[0] += offset_left
81                 keypoint[1] += offset_up
82 
83         return sample
84 
85     def _inside(self, point, width, height):
86         if point[0] < 0 or point[1] < 0:
87             return False
88         if point[0] >= width or point[1] >= height:
89             return False
90         return True
View Code

4.7.5 Flip

此處的Flip,用於在訓練階段左右鏡像圖像。此時只需要將關鍵點對應位置左右互換(如_swap_left_right中的right和left),由於還未得到paf,因而不需要對paf進行任何處理。
 1 class Flip(object):
 2     def __init__(self, prob=0.5):
 3         self._prob = prob
 4 
 5     def __call__(self, sample):
 6         prob = random.random()
 7         do_flip = prob <= self._prob
 8         if not do_flip:
 9             return sample
10 
11         sample['image'] = cv2.flip(sample['image'], 1)
12         sample['mask'] = cv2.flip(sample['mask'], 1)
13 
14         label = sample['label']
15         w, h = label['img_width'], label['img_height']
16         label['objpos'][0] = w - 1 - label['objpos'][0]
17         for keypoint in label['keypoints']:
18             keypoint[0] = w - 1 - keypoint[0]
19         label['keypoints'] = self._swap_left_right(label['keypoints'])  # 交換左右關節點
20 
21         for other_annotation in label['processed_other_annotations']:
22             other_annotation['objpos'][0] = w - 1 - other_annotation['objpos'][0]   # 水平鏡像,只寬度需要重新計算
23             for keypoint in other_annotation['keypoints']:
24                 keypoint[0] = w - 1 - keypoint[0]
25             other_annotation['keypoints'] = self._swap_left_right(other_annotation['keypoints'])   # 交換左右關節點
26 
27         return sample
28 
29     def _swap_left_right(self, keypoints):
30         right = [2, 3, 4, 8, 9, 10, 14, 16]   # 左右關節點索引
31         left = [5, 6, 7, 11, 12, 13, 15, 17]
32         for r, l in zip(right, left):
33             keypoints[r], keypoints[l] = keypoints[l], keypoints[r]
34         return keypoints
View Code

4.8 val

val的代碼沒啥好說的,也就是convert_to_coco_format
 1 def convert_to_coco_format(pose_entries, all_keypoints):
 2     coco_keypoints = []
 3     scores = []
 4     for n in range(len(pose_entries)):
 5         if len(pose_entries[n]) == 0:
 6             continue
 7         keypoints = [0] * 17 * 3
 8         to_coco_map = [0, -1, 6, 8, 10, 5, 7, 9, 12, 14, 16, 11, 13, 15, 2, 1, 4, 3]
 9         person_score = pose_entries[n][-2]
10         position_id = -1
11         for keypoint_id in pose_entries[n][:-2]:  # 最后一個為分配給當前人的關節點的數量,倒數第二個為得分。因而去掉這兩個。
12             position_id += 1
13             if position_id == 1:  # no 'neck' in COCO。COCO中沒有neck,而本代碼中neck的idx為1,因而idx為1時,continue
14                 continue
15 
16             cx, cy, score, visibility = 0, 0, 0, 0  # keypoint not found
17             if keypoint_id != -1:
18                 cx, cy, score = all_keypoints[int(keypoint_id), 0:3]
19                 cx = cx + 0.5
20                 cy = cy + 0.5
21                 visibility = 1
22             keypoints[to_coco_map[position_id] * 3 + 0] = cx
23             keypoints[to_coco_map[position_id] * 3 + 1] = cy
24             keypoints[to_coco_map[position_id] * 3 + 2] = visibility
25         coco_keypoints.append(keypoints)
26         scores.append(person_score * max(0, (pose_entries[n][-1] - 1)))  # -1 for 'neck'
27     return coco_keypoints, scores
View Code

4.9 gt label的生成

gt label通過coco.py生成,如下。其中BODY_PARTS_KPT_IDS將4.7中openpose的關鍵點映射到下面的軀干。

  1 BODY_PARTS_KPT_IDS = [[1, 8], [8, 9], [9, 10], [1, 11], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [2, 16],
  2                       [1, 5], [5, 6], [6, 7], [5, 17], [1, 0], [0, 14], [0, 15], [14, 16], [15, 17]]
  3 
  4 
  5 def get_mask(segmentations, mask):
  6     for segmentation in segmentations:
  7         rle = pycocotools.mask.frPyObjects(segmentation, mask.shape[0], mask.shape[1])
  8         mask[pycocotools.mask.decode(rle) > 0.5] = 0
  9     return mask
 10 
 11 
 12 class CocoTrainDataset(Dataset):
 13     def __init__(self, labels, images_folder, stride, sigma, paf_thickness, transform=None):
 14         super().__init__()
 15         self._images_folder = images_folder
 16         self._stride = stride
 17         self._sigma = sigma
 18         self._paf_thickness = paf_thickness
 19         self._transform = transform
 20         with open(labels, 'rb') as f:
 21             self._labels = pickle.load(f)
 22 
 23     def __getitem__(self, idx):
 24         label = copy.deepcopy(self._labels[idx])  # label modified in transform
 25         image = cv2.imread(os.path.join(self._images_folder, label['img_paths']), cv2.IMREAD_COLOR)
 26         mask = np.ones(shape=(label['img_height'], label['img_width']), dtype=np.float32)
 27         mask = get_mask(label['segmentations'], mask)
 28         sample = {'label': label, 'image': image, 'mask': mask}
 29         if self._transform:
 30             sample = self._transform(sample)
 31 
 32         mask = cv2.resize(sample['mask'], dsize=None, fx=1/self._stride, fy=1/self._stride, interpolation=cv2.INTER_AREA)
 33         keypoint_maps = self._generate_keypoint_maps(sample)  # 生成高斯分布的熱圖
 34         sample['keypoint_maps'] = keypoint_maps
 35         keypoint_mask = np.zeros(shape=keypoint_maps.shape, dtype=np.float32) # 熱圖的mask
 36         for idx in range(keypoint_mask.shape[0]):
 37             keypoint_mask[idx] = mask  # 將實際mask復制到熱圖mask的每一層上面
 38         sample['keypoint_mask'] = keypoint_mask
 39 
 40         paf_maps = self._generate_paf_maps(sample)  # 增加paf
 41         sample['paf_maps'] = paf_maps
 42         paf_mask = np.zeros(shape=paf_maps.shape, dtype=np.float32)
 43         for idx in range(paf_mask.shape[0]):
 44             paf_mask[idx] = mask  # 將實際mask復制到paf mask的每一層上面
 45         sample['paf_mask'] = paf_mask
 46 
 47         image = sample['image'].astype(np.float32)
 48         image = (image - 128) / 256  # 歸一化
 49         sample['image'] = image.transpose((2, 0, 1))  # bgr to rgb
 50         return sample
 51 
 52     def __len__(self):
 53         return len(self._labels)
 54 
 55     def _generate_keypoint_maps(self, sample):
 56         n_keypoints = 18  # 關節點總數量
 57         n_rows, n_cols, _ = sample['image'].shape
 58         keypoint_maps = np.zeros(shape=(n_keypoints + 1, n_rows // self._stride, n_cols // self._stride), dtype=np.float32)  # +1 for bg,增加背景
 59 
 60         label = sample['label']
 61         for keypoint_idx in range(n_keypoints):
 62             keypoint = label['keypoints'][keypoint_idx]
 63             if keypoint[2] <= 1:
 64                 self._add_gaussian(keypoint_maps[keypoint_idx], keypoint[0], keypoint[1], self._stride, self._sigma)   # 熱圖每一層增加高斯分布的熱圖
 65             for another_annotation in label['processed_other_annotations']:
 66                 keypoint = another_annotation['keypoints'][keypoint_idx]
 67                 if keypoint[2] <= 1:
 68                     self._add_gaussian(keypoint_maps[keypoint_idx], keypoint[0], keypoint[1], self._stride, self._sigma)   # 熱圖每一層增加高斯分布的熱圖
 69         keypoint_maps[-1] = 1 - keypoint_maps.max(axis=0)  # 背景
 70         return keypoint_maps
 71 
 72     def _add_gaussian(self, keypoint_map, x, y, stride, sigma):
 73         n_sigma = 4
 74         tl = [int(x - n_sigma * sigma), int(y - n_sigma * sigma)]  # 根據當前坐標,算出在4sigma內的起點和終點,此處為起點
 75         tl[0] = max(tl[0], 0)
 76         tl[1] = max(tl[1], 0)
 77 
 78         br = [int(x + n_sigma * sigma), int(y + n_sigma * sigma)]  # 根據當前坐標,算出在4sigma內的起點和終點,此處為終點
 79         map_h, map_w = keypoint_map.shape  # 特征圖大小
 80         br[0] = min(br[0], map_w * stride)  # 放大回原始圖像大小
 81         br[1] = min(br[1], map_h * stride)  # 放大回原始圖像大小
 82 
 83         shift = stride / 2 - 0.5
 84         for map_y in range(tl[1] // stride, br[1] // stride):      # y在特征圖上的范圍
 85             for map_x in range(tl[0] // stride, br[0] // stride):  # x在特征圖上的范圍
 86                 d2 = (map_x * stride + shift - x) * (map_x * stride + shift - x) + (map_y * stride + shift - y) * (map_y * stride + shift - y) # 距離的平方
 87                 exponent = d2 / 2 / sigma / sigma
 88                 if exponent > 4.6052:  # threshold, ln(100), ~0.01
 89                     continue
 90                 keypoint_map[map_y, map_x] += math.exp(-exponent)   # 不同關節點熱圖求和,而非像論文中那樣使用max
 91                 if keypoint_map[map_y, map_x] > 1:
 92                     keypoint_map[map_y, map_x] = 1
 93 
 94     def _generate_paf_maps(self, sample):
 95         n_pafs = len(BODY_PARTS_KPT_IDS)
 96         n_rows, n_cols, _ = sample['image'].shape
 97         paf_maps = np.zeros(shape=(n_pafs * 2, n_rows // self._stride, n_cols // self._stride), dtype=np.float32)
 98 
 99         label = sample['label']
100         for paf_idx in range(n_pafs):
101             keypoint_a = label['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][0]]  # 當前軀干起點
102             keypoint_b = label['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][1]]  # 當前軀干終點
103             if keypoint_a[2] <= 1 and keypoint_b[2] <= 1:  # 起點和終點均在圖像內,則增加paf
104                 self._set_paf(paf_maps[paf_idx * 2:paf_idx * 2 + 2], keypoint_a[0], keypoint_a[1], keypoint_b[0], keypoint_b[1], self._stride, self._paf_thickness)
105             for another_annotation in label['processed_other_annotations']:
106                 keypoint_a = another_annotation['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][0]]   # 當前軀干起點
107                 keypoint_b = another_annotation['keypoints'][BODY_PARTS_KPT_IDS[paf_idx][1]]   # 當前軀干終點
108                 if keypoint_a[2] <= 1 and keypoint_b[2] <= 1:   # 起點和終點均在圖像內,則增加paf
109                     self._set_paf(paf_maps[paf_idx * 2:paf_idx * 2 + 2], keypoint_a[0], keypoint_a[1], keypoint_b[0], keypoint_b[1], self._stride, self._paf_thickness)
110         return paf_maps
111 
112     def _set_paf(self, paf_map, x_a, y_a, x_b, y_b, stride, thickness):
113         x_a /= stride  # 原始坐標映射到特征圖上坐標
114         y_a /= stride
115         x_b /= stride
116         y_b /= stride
117         x_ba = x_b - x_a  # x方向長度
118         y_ba = y_b - y_a  # y方向長度
119         _, h_map, w_map = paf_map.shape
120         x_min = int(max(min(x_a, x_b) - thickness, 0))  # 起點到終點的方框四周增加thickness個像素
121         x_max = int(min(max(x_a, x_b) + thickness, w_map))
122         y_min = int(max(min(y_a, y_b) - thickness, 0))
123         y_max = int(min(max(y_a, y_b) + thickness, h_map))
124         norm_ba = (x_ba * x_ba + y_ba * y_ba) ** 0.5  # 起點指向終點的向量的模長
125         if norm_ba < 1e-7:  # Same points, no paf
126             return
127         x_ba /= norm_ba  #  起點指向終點的單位向量的x長度
128         y_ba /= norm_ba  #  起點指向終點的單位向量的y長度
129 
130         for y in range(y_min, y_max):  # 依次遍歷該方框中每一個點
131             for x in range(x_min, x_max):
132                 x_ca = x - x_a  # 起點指向當前點的向量
133                 y_ca = y - y_a
134                 d = math.fabs(x_ca * y_ba - y_ca * x_ba)  # 起點指向當前點的向量在起點指向終點的單位向量垂直的單位向量上的投影
135                 if d <= thickness:  # 投影小於閾值,則增加該單位向量到paf對應軀干中
136                     paf_map[0, y, x] = x_ba
137                     paf_map[1, y, x] = y_ba
138 
139 
140 class CocoValDataset(Dataset):
141     def __init__(self, labels, images_folder):
142         super().__init__()
143         with open(labels, 'r') as f:
144             self._labels = json.load(f)
145         self._images_folder = images_folder
146 
147     def __getitem__(self, idx):
148         file_name = self._labels['images'][idx]['file_name']
149         img = cv2.imread(os.path.join(self._images_folder, file_name), cv2.IMREAD_COLOR)
150         return {'img': img, 'file_name': file_name}
151 
152     def __len__(self):
153         return len(self._labels['images'])
View Code

注意:_add_gaussian的最后兩行,合並多個高斯confidence maps時,沒有使用論文中的max,而是使用min(sum(peaks), 1)。此處和官方openpose代碼一致,該文件位於caffe_train-master/src/caffe/cpm_data_transformer.cpp,具體代碼如下:

另一方面,_set_paf函數最后兩行,直將當前的單位向量增加到pafs中。若一個人某軀干將另一個人相同的軀干遮擋(或出現交叉的情況),則只會計算某一個軀干(依遍歷順序而定),但是實際上這種情況發生的概率應該相當低。

4.10 extract_keypointsgroup_keypoints

在提取關節點extract_keypoints的函數中,給每個提取到的關節點分配了一個索引,這樣所有的關節點索引均不相同。在group_keypoints 中,將這個索引放到pose_entries對應的位置,這樣不會有關節點被分配給2個人。如下面(a)、(b)兩個圖所示。

 (a)

 

(b)

keypoints.py如下:

  1 # 本文件中新的paf順序,不確定為何不用coco.py中原始的順序???
  2 BODY_PARTS_KPT_IDS = [[1, 2], [1, 5], [2, 3], [3, 4], [5, 6], [6, 7], [1, 8], [8, 9], [9, 10], [1, 11],
  3                       [11, 12], [12, 13], [1, 0], [0, 14], [14, 16], [0, 15], [15, 17], [2, 16], [5, 17]]
  4 # 本文件中新的paf順序在原始paf(coco.py)中的x和y坐標的索引
  5 BODY_PARTS_PAF_IDS = ([12, 13], [20, 21], [14, 15], [16, 17], [22, 23], [24, 25], [0, 1], [2, 3], [4, 5], [6, 7],
  6                       [8, 9], [10, 11], [28, 29], [30, 31], [34, 35], [32, 33], [36, 37], [18, 19], [26, 27])
  7 
  8 
  9 def linspace2d(start, stop, n=10):
 10     points = 1 / (n - 1) * (stop - start)  # 起點和終點之間插值點,包括終點共n個
 11     return points[:, None] * np.arange(n) + start[:, None]
 12 
 13 
 14 def extract_keypoints(heatmap, all_keypoints, total_keypoint_num):
 15     heatmap[heatmap < 0.1] = 0  # 熱圖中小於閾值的置0
 16     heatmap_with_borders = np.pad(heatmap, [(2, 2), (2, 2)], mode='constant')  # 邊界各填充2個像素
 17     heatmap_center = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 1:heatmap_with_borders.shape[1]-1]  # heatmap_center中心,比熱圖四邊各多1個像素
 18     heatmap_left = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 2:heatmap_with_borders.shape[1]] # 實際上為熱圖右邊的圖
 19     heatmap_right = heatmap_with_borders[1:heatmap_with_borders.shape[0]-1, 0:heatmap_with_borders.shape[1]-2]  # 實際上為熱圖左邊的圖
 20     heatmap_up = heatmap_with_borders[2:heatmap_with_borders.shape[0], 1:heatmap_with_borders.shape[1]-1]  # 實際上為熱圖下邊的圖
 21     heatmap_down = heatmap_with_borders[0:heatmap_with_borders.shape[0]-2, 1:heatmap_with_borders.shape[1]-1]  # 實際上為熱圖上邊的圖
 22 
 23     heatmap_peaks = (heatmap_center > heatmap_left) & (heatmap_center > heatmap_right) &\
 24                     (heatmap_center > heatmap_up) & (heatmap_center > heatmap_down)  # 熱圖當前像素比上下左右的熱圖的像素都大的,為峰值
 25     heatmap_peaks = heatmap_peaks[1:heatmap_center.shape[0]-1, 1:heatmap_center.shape[1]-1]  # 得到和原始的熱圖一樣大的熱圖
 26     keypoints = list(zip(np.nonzero(heatmap_peaks)[1], np.nonzero(heatmap_peaks)[0]))  # (w, h)  得到峰值(關節點)的xy坐標 np.nonzero得到2*N向量,0為x,1為y
 27     keypoints = sorted(keypoints, key=itemgetter(0))  # 按照x坐標從小到大排序
 28 
 29     suppressed = np.zeros(len(keypoints), np.uint8)  # 第i個坐標(關節點)應該被抑制的flag
 30     keypoints_with_score_and_id = []
 31     keypoint_num = 0
 32     for i in range(len(keypoints)):
 33         if suppressed[i]:
 34             continue
 35         for j in range(i+1, len(keypoints)):  # 依次比較第i點和后面所有j點距離的平方的和,小於閾值,則抑制后面第j個點
 36             if math.sqrt((keypoints[i][0] - keypoints[j][0]) ** 2 + (keypoints[i][1] - keypoints[j][1]) ** 2) < 6:
 37                 suppressed[j] = 1
 38         keypoint_with_score_and_id = (keypoints[i][0], keypoints[i][1], heatmap[keypoints[i][1], keypoints[i][0]], total_keypoint_num + keypoint_num)
 39         keypoints_with_score_and_id.append(keypoint_with_score_and_id)  # 當前點的x、y坐標,當前點熱圖值,當前點在所有特征點中的index
 40         keypoint_num += 1  # 特征點數量+1
 41     all_keypoints.append(keypoints_with_score_and_id)  # 將當前熱圖上檢測到的所有關節點添加到所有關節點中
 42     return keypoint_num  # 返回總共特征點的數量
 43 
 44 
 45 def group_keypoints(all_keypoints_by_type, pafs, pose_entry_size=20, min_paf_score=0.05, demo=False):
 46     pose_entries = []
 47     all_keypoints = np.array([item for sublist in all_keypoints_by_type for item in sublist]) # 將所有關節點展開成N*4的array
 48     for part_id in range(len(BODY_PARTS_PAF_IDS)):  # 將軀干某個連接的單位向量映射到paf對應的通道
 49         part_pafs = pafs[:, :, BODY_PARTS_PAF_IDS[part_id]] # 得到當前軀干的2維單位向量(xy)
 50         kpts_a = all_keypoints_by_type[BODY_PARTS_KPT_IDS[part_id][0]]  # 當前軀干所有起點  BODY_PARTS_KPT_IDS為將關節點連接成軀干的映射
 51         kpts_b = all_keypoints_by_type[BODY_PARTS_KPT_IDS[part_id][1]]  # 當前軀干所有終點  kpts_a和kpts_b為[],里面可能有幾個4維向量,也可能為空
 52         num_kpts_a = len(kpts_a)  # 起點個數
 53         num_kpts_b = len(kpts_b)  # 終點個數
 54         kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0]  # 當前軀干起點的id
 55         kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1]  # 當前軀干終點的id
 56 
 57         if num_kpts_a == 0 and num_kpts_b == 0:  # no keypoints for such body part # 當前軀干無關節點
 58             continue
 59         elif num_kpts_a == 0:  # body part has just 'b' keypoints  當前軀干只有終點的關節點
 60             for i in range(num_kpts_b):  # 依次遍歷所有終點
 61                 num = 0
 62                 for j in range(len(pose_entries)):  # check if already in some pose, was added by another body part 和已經分配的所有人依次比較
 63                     if pose_entries[j][kpt_b_id] == kpts_b[i][3]:  # 如果當前終點已經分配給了某個人
 64                         num += 1  # 數量+1
 65                         continue  # 退出此處for j的循環
 66                 if num == 0: # 當前終點未分配給任何人,則新建一個人
 67                     pose_entry = np.ones(pose_entry_size) * -1
 68                     pose_entry[kpt_b_id] = kpts_b[i][3]  # keypoint idx
 69                     pose_entry[-1] = 1                   # num keypoints in pose
 70                     pose_entry[-2] = kpts_b[i][2]        # pose score
 71                     pose_entries.append(pose_entry)
 72             continue
 73         elif num_kpts_b == 0:  # body part has just 'a' keypoints  當前軀干只有起點的關節點
 74             for i in range(num_kpts_a):  # 依次遍歷所有起點
 75                 num = 0
 76                 for j in range(len(pose_entries)):  # 和分配的所有人依次比較
 77                     if pose_entries[j][kpt_a_id] == kpts_a[i][3]:  # 如果當前起點已經分配給了某個人
 78                         num += 1  # 數量+1
 79                         continue  # 退出此處for j的循環
 80                 if num == 0: # 當前起點未分配給任何人,則新建一個人
 81                     pose_entry = np.ones(pose_entry_size) * -1
 82                     pose_entry[kpt_a_id] = kpts_a[i][3]
 83                     pose_entry[-1] = 1
 84                     pose_entry[-2] = kpts_a[i][2]
 85                     pose_entries.append(pose_entry)
 86             continue
 87 
 88         connections = []                             # 軀干的連接 # 當前軀干起點和終點都有關節點
 89         for i in range(num_kpts_a):                  # 依次遍歷起點的每個關節點
 90             kpt_a = np.array(kpts_a[i][0:2])         # 起點當前關節點的坐標
 91             for j in range(num_kpts_b):              # 依次遍歷終點的每個關節點
 92                 kpt_b = np.array(kpts_b[j][0:2])     # 終點當前關節點的坐標
 93                 mid_point = [(), ()]
 94                 mid_point[0] = (int(round((kpt_a[0] + kpt_b[0]) * 0.5)), int(round((kpt_a[1] + kpt_b[1]) * 0.5)))
 95                 mid_point[1] = mid_point[0]  # 起點和終點的中點
 96 
 97                 vec = [kpt_b[0] - kpt_a[0], kpt_b[1] - kpt_a[1]]  # 起點指向終點的單位向量
 98                 vec_norm = math.sqrt(vec[0] ** 2 + vec[1] ** 2)
 99                 if vec_norm == 0:
100                     continue
101                 vec[0] /= vec_norm
102                 vec[1] /= vec_norm
103                 cur_point_score = (vec[0] * part_pafs[mid_point[0][1], mid_point[0][0], 0] +  # part_pafs第0維為y索引,第1維為x索引,第2維為paf單位
104                                    vec[1] * part_pafs[mid_point[1][1], mid_point[1][0], 1])   # 向量的x或者y索引,此處為nx*x+ny*y,即paf在單位向量上的投影長度
105 
106                 height_n = pafs.shape[0] // 2
107                 success_ratio = 0
108                 point_num = 10  # number of points to integration over paf  # paf上兩點之間抽10個點,累計paf
109                 if cur_point_score > -100:
110                     passed_point_score = 0
111                     passed_point_num = 0
112                     x, y = linspace2d(kpt_a, kpt_b)  # 起點和終點之間插值,得到point_num個點
113                     for point_idx in range(point_num):
114                         if not demo:
115                             px = int(round(x[point_idx]))  # 四舍五入坐標
116                             py = int(round(y[point_idx]))
117                         else:
118                             px = int(x[point_idx])      # 截斷坐標
119                             py = int(y[point_idx])
120                         paf = part_pafs[py, px, 0:2]  # 得到起點和終點中間抽點處paf的xy向量
121                         cur_point_score = vec[0] * paf[0] + vec[1] * paf[1]  # 該向量在起點指向終點單位向量上的投影
122                         if cur_point_score > min_paf_score:  # 投影大於閾值
123                             passed_point_score += cur_point_score  # 累計插值點score
124                             passed_point_num += 1                  # 累計插值點數量
125                     success_ratio = passed_point_num / point_num  # 插值點中大於閾值的點的數量占總插值點數量的比例
126                     ratio = 0
127                     if passed_point_num > 0:
128                         ratio = passed_point_score / passed_point_num  # 累計paf的平均值
129                     ratio += min(height_n / vec_norm - 1, 0)  # 兩特征點距離較遠,則懲罰paf平均值(較遠左側小於0)
130                 if ratio > 0 and success_ratio > 0.8:  # 累計paf平均值大於0,且兩關節點之間插值的點大於閾值的點的比例大於閾值
131                     score_all = ratio + kpts_a[i][2] + kpts_b[j][2]  # paf+起點熱圖+終點熱圖,作為當前起點和終點是一個軀干的score
132                     connections.append([i, j, ratio, score_all])  # 當前起點和終點是一個軀干時起點在該關節點所有起點中的索引,終點在該關節點中所有終點的索引,paf均值,是一個軀干的得分
133         if len(connections) > 0:
134             connections = sorted(connections, key=itemgetter(2), reverse=True)  # 按照paf均值排序
135 
136         num_connections = min(num_kpts_a, num_kpts_b)  # 當前圖像上該軀干最多的數量(起點和終點較少值)
137         has_kpt_a = np.zeros(num_kpts_a, dtype=np.int32)  # 起點被占用的flag
138         has_kpt_b = np.zeros(num_kpts_b, dtype=np.int32)  # 終點被占用的flag
139         filtered_connections = []   # 清理之后的connections:當前軀干起點在所有關節點中的索引,終點在所有關節點中的索引,paf均值
140         for row in range(len(connections)):
141             if len(filtered_connections) == num_connections:  # 已經達到最多關節點數量了,不用繼續比較了
142                 break
143             i, j, cur_point_score = connections[row][0:3]  # 當前起點和終點是一個軀干時起點在該關節點所有起點中的索引,終點在該關節點中所有終點的索引,paf均值
144             if not has_kpt_a[i] and not has_kpt_b[j]:  # 起點和終點均未被占用(如果i某個起點或者某個終點被分配給了不同的軀干,因paf從大到小排序,故paf較小的忽略)
145                 filtered_connections.append([kpts_a[i][3], kpts_b[j][3], cur_point_score])  # 當前軀干起點在所有關節點中的索引,終點在所有關節點中的索引,paf均值
146                 has_kpt_a[i] = 1  # 對應起點被占用
147                 has_kpt_b[j] = 1  # 對應終點被占用
148         connections = filtered_connections  # 使用清理之后的connections,實際上score_all未使用
149         if len(connections) == 0:  # 當前無軀干,計算下一個軀干
150             continue
151 
152         if part_id == 0:  # 第一次計算軀干
153             pose_entries = [np.ones(pose_entry_size) * -1 for _ in range(len(connections))]  # 前18個為每個人各個關節點在所有關節點中的索引,最后兩個分別為總分值和分配給這個人關節點的數量
154             for i in range(len(connections)):  # 依次遍歷當前找到的所有該軀干
155                 pose_entries[i][BODY_PARTS_KPT_IDS[0][0]] = connections[i][0]  # 起點在所有關節點中的索引
156                 pose_entries[i][BODY_PARTS_KPT_IDS[0][1]] = connections[i][1]  # 終點在所有關節點中的索引
157                 pose_entries[i][-1] = 2  # 當前人所有關節點的數量
158                 pose_entries[i][-2] = np.sum(all_keypoints[connections[i][0:2], 2]) + connections[i][2]  # 兩個關節點熱圖值+平均paf值
159         elif part_id == 17 or part_id == 18:  # 最后兩個軀干
160             kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0]   # 起點的id
161             kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1]   # 終點的id
162             for i in range(len(connections)):  # 將當前軀干和part_id=0時分配的所有人依次比較。此處為當前軀干
163                 for j in range(len(pose_entries)):   # 此處為分配的所有人
164                     if pose_entries[j][kpt_a_id] == connections[i][0] and pose_entries[j][kpt_b_id] == -1:  # 當前軀干的起點和分配到的某個人的起點一致,且當前軀干的終點未分配
165                         pose_entries[j][kpt_b_id] = connections[i][1]  # 將當前軀干的終點分配到這個人對應終點上
166                     elif pose_entries[j][kpt_b_id] == connections[i][1] and pose_entries[j][kpt_a_id] == -1: # 當前軀干的終點和分配到的某個人的終點一致,且當前軀干的起點未分配
167                         pose_entries[j][kpt_a_id] = connections[i][0]  # 將當前軀干的起點分配到這個人對應起點上
168             continue
169         else:
170             kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0]  # 起點的id
171             kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1]  # 終點的id
172             for i in range(len(connections)):  # 將當前軀干和part_id=0時分配的所有人依次比較。此處為當前軀干
173                 num = 0
174                 for j in range(len(pose_entries)):   # 此處為分配的所有人
175                     if pose_entries[j][kpt_a_id] == connections[i][0]:  # 當前軀干的起點和分配到的某個人的起點一致
176                         pose_entries[j][kpt_b_id] = connections[i][1]  # 將當前軀干的終點分配到這個人對應終點上
177                         num += 1  # 分配的人+1
178                         pose_entries[j][-1] += 1  # 當前人所有關節點的數量+1
179                         pose_entries[j][-2] += all_keypoints[connections[i][1], 2] + connections[i][2]  # 當前人socre增加
180                 if num == 0:  # 如果沒有分配到的人,則新建一個人
181                     pose_entry = np.ones(pose_entry_size) * -1
182                     pose_entry[kpt_a_id] = connections[i][0]
183                     pose_entry[kpt_b_id] = connections[i][1]
184                     pose_entry[-1] = 2
185                     pose_entry[-2] = np.sum(all_keypoints[connections[i][0:2], 2]) + connections[i][2]
186                     pose_entries.append(pose_entry)
187 
188     filtered_entries = []
189     for i in range(len(pose_entries)):  # 依次遍歷所有分配的人
190         if pose_entries[i][-1] < 3 or (pose_entries[i][-2] / pose_entries[i][-1] < 0.2): # 如果當前人關節點數量少於3,或者當前人平均得分小於0.2,則刪除該人
191             continue
192         filtered_entries.append(pose_entries[i])
193     pose_entries = np.asarray(filtered_entries)
194     return pose_entries, all_keypoints  # 返回所有分配的人(前18維為每個人各個關節點在所有關節點中的索引,后兩唯為每個人得分及每個人關節點數量),及所有關節點信息
View Code

4.11 demo

demo中兩個函數代碼如下:

 1 def infer_fast(net, img, net_input_height_size, stride, upsample_ratio, cpu,
 2                pad_value=(0, 0, 0), img_mean=(128, 128, 128), img_scale=1/256):
 3     height, width, _ = img.shape   # 實際高寬
 4     scale = net_input_height_size / height   # 將實際高所放到期望高的縮放倍數
 5 
 6     scaled_img = cv2.resize(img, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)  # 縮放后的圖像
 7     scaled_img = normalize(scaled_img, img_mean, img_scale)  # 歸一化圖像
 8     min_dims = [net_input_height_size, max(scaled_img.shape[1], net_input_height_size)]
 9     padded_img, pad = pad_width(scaled_img, stride, pad_value, min_dims)  # 填充到高寬為stride整數倍的值
10 
11     tensor_img = torch.from_numpy(padded_img).permute(2, 0, 1).unsqueeze(0).float()   # 由HWC轉成CHW(BGR格式)
12     if not cpu:
13         tensor_img = tensor_img.cuda()
14 
15     stages_output = net(tensor_img) # 得到網絡的輸出
16 
17     stage2_heatmaps = stages_output[-2]  # 最后一個stage的熱圖
18     heatmaps = np.transpose(stage2_heatmaps.squeeze().cpu().data.numpy(), (1, 2, 0))  # 最后一個stage的熱圖作為最終的熱圖
19     heatmaps = cv2.resize(heatmaps, (0, 0), fx=upsample_ratio, fy=upsample_ratio, interpolation=cv2.INTER_CUBIC)  # 熱圖放大upsample_ratio倍
20 
21     stage2_pafs = stages_output[-1]  # 最后一個stage的paf
22     pafs = np.transpose(stage2_pafs.squeeze().cpu().data.numpy(), (1, 2, 0))   # 最后一個stage的paf作為最終的paf
23     pafs = cv2.resize(pafs, (0, 0), fx=upsample_ratio, fy=upsample_ratio, interpolation=cv2.INTER_CUBIC)  # paf放大upsample_ratio倍
24 
25     return heatmaps, pafs, scale, pad  # 返回熱圖,paf,輸入模型圖像相比原始圖像縮放倍數,輸入模型圖像padding尺寸
26 
27 
28 def run_demo(net, image_provider, height_size, cpu):
29     net = net.eval()
30     if not cpu:
31         net = net.cuda()
32 
33     stride = 8
34     upsample_ratio = 4
35     color = [0, 224, 255]
36     for img in image_provider:
37         orig_img = img.copy()
38         heatmaps, pafs, scale, pad = infer_fast(net, img, height_size, stride, upsample_ratio, cpu)  # 熱圖,paf,輸入模型圖像相比原始圖像縮放倍數,輸入模型圖像padding尺寸
39 
40         total_keypoints_num = 0
41         all_keypoints_by_type = []  # all_keypoints_by_type為18個list,每個list包含Ni個當前點的x、y坐標,當前點熱圖值,當前點在所有特征點中的index
42         for kpt_idx in range(18):  # 19th for bg  第19個為背景,之考慮前18個關節點
43             total_keypoints_num += extract_keypoints(heatmaps[:, :, kpt_idx], all_keypoints_by_type, total_keypoints_num)
44 
45         pose_entries, all_keypoints = group_keypoints(all_keypoints_by_type, pafs, demo=True)  # 得到所有分配的人(前18維為每個人各個關節點在所有關節點中的索引,后兩唯為每個人得分及每個人關節點數量),及所有關節點信息
46         for kpt_id in range(all_keypoints.shape[0]):  # 依次將每個關節點信息縮放回原始圖像上
47             all_keypoints[kpt_id, 0] = (all_keypoints[kpt_id, 0] * stride / upsample_ratio - pad[1]) / scale
48             all_keypoints[kpt_id, 1] = (all_keypoints[kpt_id, 1] * stride / upsample_ratio - pad[0]) / scale
49         for n in range(len(pose_entries)):  # 依次遍歷找到的每個人
50             if len(pose_entries[n]) == 0:
51                 continue
52             for part_id in range(len(BODY_PARTS_PAF_IDS) - 2):  # 將軀干某個連接的單位向量映射到paf對應的通道
53                 kpt_a_id = BODY_PARTS_KPT_IDS[part_id][0]   # 當前軀干起點的id
54                 global_kpt_a_id = pose_entries[n][kpt_a_id]  # 當前關節點在所有關節點中的索引
55                 if global_kpt_a_id != -1:  # 分配了當前關節點
56                     x_a, y_a = all_keypoints[int(global_kpt_a_id), 0:2]  # 當前關節點在原圖像上的坐標
57                     cv2.circle(img, (int(x_a), int(y_a)), 3, color, -1)  # 原圖畫圓
58                 kpt_b_id = BODY_PARTS_KPT_IDS[part_id][1]   # 當前軀干終點的id
59                 global_kpt_b_id = pose_entries[n][kpt_b_id]  # 當前關節點在所有關節點中的索引
60                 if global_kpt_b_id != -1:  # 分配了當前關節點
61                     x_b, y_b = all_keypoints[int(global_kpt_b_id), 0:2]  # 當前關節點在原圖像上的坐標
62                     cv2.circle(img, (int(x_b), int(y_b)), 3, color, -1)  # 原圖畫圓
63                 if global_kpt_a_id != -1 and global_kpt_b_id != -1: # 起點和終點均分配
64                     cv2.line(img, (int(x_a), int(y_a)), (int(x_b), int(y_b)), color, 2)  # 畫連接起點和終點的直線
65 
66         img = cv2.addWeighted(orig_img, 0.6, img, 0.4, 0)  # 0.6 * orig_img + 0.4 * img
67         cv2.imwrite('res.jpg', img)
View Code

4.12 左右鏡像

此處的左右鏡像,指測試階段的左右鏡像。不要和4.7.5中訓練階段的Flip弄混。由於在測試階段,已經得到了關鍵點和paf,因而若左右鏡像圖像,需要將heatmaps及pafs進行重新映射,如下表所示。另一方面,需要將paf的x坐標取負,因為paf是從起點指向終點的向量。左右鏡像后,起點指向終點的向量的y分量不變,但是x分量則相反。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM