End to end recovery of human shape and pose

本文轉載自查看原文 2020-06-23 21:17 710 姿態估計

End to end recovery of human shape and pose

End to end recovery of human shape and pose

一. 論文簡介

從單張圖像中恢復 2D keypoints + 3D keypoints + mesh + instrinsic(圖像坐標系到相機坐標系，這里沒有相機坐標系)，主要在數據量不充足的情況下進行弱監督。

主要做的貢獻如下（可能之前有人已提出）：

Mesh supervised weakly

Iteration regression

Discriminator

projection

二. 模塊詳解

2.1 Mesh supervised weakly

論文未使用這部分做弱監督，在其它論文看見的，暫時略過。

2.2 Iteration regression

注意不是權重共享
注意每次迭代的結果都進行loss計算，不是最后的結果才做loss
前向計算同樣需要做迭代

class ThetaRegressor(LinearModel):
    def __init__(self, fc_layers, use_dropout, drop_prob, use_ac_func, iterations):
        super(ThetaRegressor, self).__init__(fc_layers, use_dropout, drop_prob, use_ac_func)
        self.iterations = iterations
        batch_size = max(args.batch_size + args.batch_3d_size, args.eval_batch_size)
        mean_theta = np.tile(util.load_mean_theta(), batch_size).reshape((batch_size, -1))
        self.register_buffer('mean_theta', torch.from_numpy(mean_theta).float())
    '''
        param:
            inputs: is the output of encoder, which has 2048 features
        
        return:
            a list contains [ [theta1, theta1, ..., theta1], [theta2, theta2, ..., theta2], ... , ], shape is iterations X N X 85(or other theta count)
    '''
    def forward(self, inputs):
        thetas = []
        shape = inputs.shape
        theta = self.mean_theta[:shape[0], :]
        for _ in range(self.iterations):
            total_inputs = torch.cat([inputs, theta], 1)
            theta = theta + self.fc_blocks(total_inputs) # 不共享權重
            thetas.append(theta) # 迭代的theta全部做loss回傳
        return thetas

2.3 Discriminator

Gan網絡的基礎
實際人體來自於SMPL模型

  # Discriminator類的前向計算，就是FC層的一些變換，輸出一個vector
  def forward(self, thetas):
        batch_size = thetas.shape[0]
        cams, poses, shapes = thetas[:, :3], thetas[:, 3:75], thetas[:, 75:]
        shape_disc_value = self.shape_discriminator(shapes)
        rotate_matrixs = util.batch_rodrigues(poses.contiguous().view(-1, 3)).view(-1, 24, 9)[:, 1:, :]
        pose_disc_value, pose_inter_disc_value = self.pose_discriminator(rotate_matrixs)
        full_pose_disc_value = self.full_pose_discriminator(pose_inter_disc_value.contiguous().view(batch_size, -1))
        return torch.cat((pose_disc_value, full_pose_disc_value, shape_disc_value), 1)

  # 真實為1，錯誤為0  
  def batch_encoder_disc_l2_loss(self, disc_value):
        k = disc_value.shape[0]
        return torch.sum((disc_value - 1.0) ** 2) * 1.0 / k

2.4 projection

相機坐標系到圖像坐標系的投影，使用2D Loss
如果有三維坐標，直接使用3D Loss (如果mesh生產的3D點和實際標注3D點不相同，以mesh為准)
s表示縮放比例，T代表平移，R表示旋轉。其中s、T可以表示內參，R放在Mesh內部作為參數。
弱相機模型：不使用小孔成像，直接使用正向投影。
弱相機模型：
- 優點-可以直接從相機坐標系（root-relate）直接轉化到圖像坐標系進行監督。
- 缺點：不知道focal length的情況下，直接強行擬合正向投影存在誤差（無法避免，由於存在一個尺度）。

  def batch_orth_proj(X, camera):
    '''
        X is N x num_points x 3
    '''
    camera = camera.view(-1, 1, 3)
    X_trans = X[:, :, :2] + camera[:, :, 1:]
    shape = X_trans.shape
    return (camera[:, :, 0] * X_trans.view(shape[0], -1)).view(shape)
  # camera space to image space
  def _calc_detail_info(self, theta):
        cam = theta[:, 0:3].contiguous()
        pose = theta[:, 3:75].contiguous()
        shape = theta[:, 75:].contiguous()
        verts, j3d, Rs = self.smpl(beta = shape, theta = pose, get_skin = True)
        j2d = util.batch_orth_proj(j3d, cam)

        return (theta, verts, j2d, j3d, Rs)

三. 缺點

直接回歸內參較為困難
數據量小，使用Gan進行監督很難得到魯棒的結果。可能反而效果更差。
mesh部分沒有進行監督，浪費資源

pytorch代碼

tensorflow代碼

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【論文筆記+復現踩坑】End-to-end Recovery of Human Shape and Pose(CVPR 2018) The End #if 0 #end if 【論文筆記】Learning to Estimate 3D Human Pose and Shape from a Single Color Image(CVPR 2018) （轉）Awesome Human Pose Estimation SMPL模型Shape和Pose參數 Integral Human Pose Regression論文閱讀 Learning Feature Pyramids for Human Pose Estimation（理解） DensePose: Dense Human Pose Estimation In The Wild（理解） TransPose: Towards Explainable Human Pose Estimation by Transformer