SMPL模型學習

本文轉載自查看原文 2021-01-24 18:35 2395 HMR

動畫制作相關術語

Vertex(頂點)：動畫模型可以看成多個小三角形（四邊形）組成，每個小三角形就可以看成一個頂點。頂點越多，動畫模型越精細。
骨骼點：人體的一些關節點，類似於人體姿態估計的關鍵點。每個骨骼點都由一個三元組作為參數去控制（可以查看歐拉角，四元數相關概念）
蒙皮：將模型從一個姿態轉變為另一個姿態，使用的轉換矩陣叫做蒙皮矩陣。
骨骼蒙皮（Rig）：建立骨骼點和頂點的關聯關系。每個骨骼點會關聯許多頂點，並且每一個頂點權重不一樣。通過這種關聯關系，就可以通過控制骨骼點的旋轉向量來控制整個人運動。
紋理貼圖：動畫人體模型的表面紋理，即衣服褲子這些。
texture map：將3D多邊形網格表面的紋理展開到2D平面，得到紋理圖像
混合形狀（BlendShape）：控制動畫角色運動有兩種，一種是上面說的利用Rig，還有一種是利用BlendShape。比如：生成一種笑臉和正常臉，那么通過BlendShape就可以自動生成二者過渡的動畫。這種方式相比於利用Rig，可以不定義骨骼點，比較方便。它指相對於base shape的變形（deformation），這種deformation是通常被表示為頂點的偏移量（vertex displacements），是由某種參數有關的function確定的
混合蒙皮技術（Blend Skinning）：一種模型網格（mesh）隨內在的骨骼結構（skeletal structure）變形的方法。網格的每個頂點（vertex）對於不同的關節點有不同的影響權重（weighted influence），頂點在變形時，形變量與這個權重相關
LBS（Linear Blend Skinning ）：線性混合蒙皮。使用最廣泛，但是在關節處會產生不真實的變形
DQBS（dual-quaternion blend skinning ）：雙四元數混合蒙皮。
頂點權重(vertex weights)：用於變形網格mesh
uv map：將3D多邊形網格展開到2D平面得到 UV圖像
拓撲(topology)：重新拓撲是將高分辨率模型轉換為可用於動畫的較小模型的過程。兩個mesh拓撲結構相同是指兩個mesh上面任一個三角面片的三個頂點的ID是一樣的（如某一個三角面片三個頂點是2,5,8；另一個mesh上也必有一個2,5,8組成的三角面片）
pose-dependent blend shape:姿勢相關的混合變形
regressor from shape to joint locations: 形狀到關節位置的回歸函數

相關算法詳解

SMPL

1.介紹

SMPL（Skinned Multi-Person Linear Model）是一種裸體的（skinned），基於頂點（vertex-based）的人體三維模型，能夠精確地表示人體的不同形狀（shape）和姿態（pose）。

SMPL適用於動畫領域，可以隨姿態變化自然的變形，並伴隨軟組織的自然運動。SMPL與現有的許多圖形渲染管線都是兼容的。

SMPL是一種可學習的模型，通過訓練可以更好地擬合人體的形狀和不同姿態下的形變。

它將身體形狀分為identity-dependent shape和non-rigid pose-dependent shape。人體可以理解為是一個基礎模型和在該模型基礎上進行形變的總和，在形變基礎上進行PCA，得到刻畫形狀的低維參數——形狀參數（shape）；同時，使用運動樹表示人體的姿勢，即運動樹每個關節點和父節點的旋轉關系，該關系可以表示為三維向量，最終每個關節點的局部旋轉向量構成了SPML模型的姿勢參數(pose)。

這種方法與傳統的LBS的最大的不同在於其提出的人體姿態影像體表形貌的方法，這種方法可以模擬人的肌肉在肢體運動過程中的凸起和凹陷。因此可以避免人體在運動過程中的表面失真，可以精准的刻畫人的肌肉拉伸以及收縮運動的形貌。

在SMPL文章中介紹了SMPL的總體模型，這個模型是通過訓練得到，就是一些參數，該模型中β和θ是其中的輸入參數，其中β代表人體高矮胖瘦、頭身比等比例的10個參數，是一個10-D的vector。θ是代表人體整體運動位姿和24個關節相對角度的75(24*3+3;每個關節點3個自由度，再加上3個根節點)個參數，是一個3K-D的vector（代表pose，其中K為骨架節點數，3是每個關節具有的3個自由度）。 β參數是Shape Blend Pose參數，可以通過10個增量模板控制人體形狀變化：具體而言：每個參數控制人體形態的變化可以通過動圖來刻畫。

SMPL骨架的節點個數為24，標注了人體影響姿態的幾個主要關節，即：

smpl_names = [
                'Left_Hip', 'Right_Hip', 'Waist', 'Left_Knee', 'Right_Knee',
                'Upper_Waist', 'Left_Ankle', 'Right_Ankle', 'Chest',
                'Left_Toe', 'Right_Toe', 'Base_Neck', 'Left_Shoulder',
                'Right_Shoulder', 'Upper_Neck', 'Left_Arm', 'Right_Arm',
                'Left_Elbow', 'Right_Elbow', 'Left_Wrist', 'Right_Wrist',
                'Left_Finger', 'Right_Finger'
            ]

加上position的三個維度，則該模型最終總的輸入就是10+3+3x24=85-D的數據。
根據輸入的數據，對標准模型進行一步步的變化，大概流程就是：

1. Add shape blend shapes（縮放）
2. Infer shape-dependent joint locations.（根據shape調整joint）
3. Add pose blend shapes（胖瘦變形）
4. Get the global joint location（擺pose）
5. Do skinning（給骨架包裹外皮）

最終生成的模型是具有6980個頂點的mesh。

smpl 10個shape參數分別對應的物理意義：（實際有50個參數，開源的只有10個）smpl官網的unity模型可以用slider 控制參數變化

0 代表整個人體的胖瘦和大小，初始為0的情況下，正數變瘦小，負數變大胖（±5）
1 側面壓縮拉伸，正數壓縮
2 正數變胖大
3 負數肚子變大很多，人體縮小
4 代表 chest、hip、abdomen的大小，初始為0的情況下，正數變大，負數變小（±5）
5 負數表示大肚子+整體變瘦
6 正數表示肚子變得特別大的情況下，其他部位非常瘦小
7 正數表示身體被縱向擠壓
8 正數表示橫向表胖
9 正數表示肩膀變寬

2.相關參數：

頂點 vertical: N=6890

關節 joint: K=23

網格對男女具有相同的拓撲結構，空間分辨率可變，干凈的四元數結構，分割成多部分，有初始混合權重和骨骼蒙皮。

模型生成的函數：

$M(\vec \beta ,\vec \theta ;\Phi ) = W\left( {{T_P}(\vec \beta ,\vec \theta ),J(\vec \beta ),\vec \theta ,{\cal W}} \right)\mathbb{R}{^{\left| {\mathop \theta \limits^ \to } \right| \times \left| {\mathop \beta \limits^ \to } \right|}} \mapsto {\mathbb{R}^{3N}}$, SPML function: 將形狀和位姿參數映射成頂點。$\Phi $表示學習到的參數。
$W(\overline{\mathrm{T}}, \mathbf{J}, \vec{\theta}, \mathcal{W}): \mathbb{R}^{3 N \times 3 K \times|\vec{\theta}| \times|\mathcal{W}|} \mapsto \mathbb{R}^{3 N}$： the standard linear blend skinning function. 標准線性混合蒙皮.從模板smpl模型中取vertics in the rest pose $\overline{\mathrm{T}}$,joint locations J,a pose $\vec{\theta}$ and the blend weights $ \mathcal{W}$ , 輸出posed vertices.
$B_{P}(\vec{\theta}): \mathbb{R}^{|\theta|} \mapsto \mathbb{R}^{3 N}$:a pose-dependent blend shape function. 一個取決於位姿的混合模型函數。輸入是$\mathop \theta \limits^ \to $:一系列位姿參數向量，代表位姿的相關形變。它會和blend shapes加在一起，施加到rest pose上。
$B_{S}(\vec{\beta}): \mathbb{R}^{\mid \beta \mid} \mapsto \mathbb{R}^{3N}$：a blend shape funchtion. 混合變形函數。輸入是形狀參數向量（shape parameters）$\vec{\beta}$,輸出是塑造了目標身份的混合形狀（a blend shape sculpting the subject identity），也就是人體模型。
$B_D$: Dynamic blendshapes function
$J(\vec{\beta}): \mathbb{R}^{\mid \beta \mid} \mapsto \mathbb{R}^{3K}$: a function to predict K joint locations. 一個預測K個關節位置的函數.

模型輸入的參數：

$\beta$:$\vec{\beta}=\left[\vec{\beta}_{1}, \ldots, \overrightarrow{\beta_{\mid \beta \mid} }\right]^{T}$,形狀參數（shape parameter）其中，$\mid \beta \mid$是線性形狀系數（linear shape coefficients）的數量

$\theta$：,$\vec{\theta}=\left[\vec{\omega}_{0}^{T}, \ldots, \vec{\omega}_{K}^{T}\right]^{T}$姿態參數（pose parameter）.其中，$w_k$指關節k相對於運動樹（kinematic tree）中的父關節點的旋轉軸角度,${\omega _{\rm{k}}} \in {R^3}$

$\mathop \omega \limits^ \to $: Scaled axis of rotation; the 3 pose parameters corresponding to a particular joint

$\vec{\theta}^{*}$：原始姿態(zero pose)

$\overrightarrow \phi $: Dynamic control vector

$\overrightarrow \delta $: Dynamic shape coefficients

要通過訓練集訓練獲取的參數：

$\overline{\mathrm{T}} \in \mathbb{R}^{3 N}$:Mean shape of the template.平均模型，由N個串聯的頂點表示(a mean template shape represented by a vector of N concatenated vertices)

$\mathcal{S}=\left[\mathbf{S}_{1}, \ldots, \mathbf{S}_{|\vec{\beta}|}\right] \in \mathbb{R}^{3 N \times|\vec{\beta}|}$: shape blend shapes. 所有207個姿勢混合形狀組成的矩陣 (由姿勢引起位移的正交主成分)

$\mathcal{P}$: Pose blend shapes

$\mathcal{W} \in \mathbb{R}^{N \times K}$： blend weights. 一組混合權重，BS/QBS混合權重矩陣，即關節點對頂點的影響權重 (第幾個頂點受哪些關節點的影響且權重分別為多少) (a set of blend weights)

$\mathcal{J}:$ Joint regressor matrix. 將rest vertices轉換成rest joints的矩陣（獲取T pose的關節點坐標的矩陣）[完成頂點到關節的轉化]

訓練數據：

$V$: A registration

$V^P$: Pose dataset registration

$V^S$: Shape dataset registration

$\hat{\mathbf{T}}^{P}$:Pose dataset subject shape; body vertices in the template pose

$\hat{\mathbf{J}}^{P}$:Pose dataset subject joint locations in the template pose

$\hat{\mathbf{T}}_{\mu}^{P}$:Mean shape of a pose subject; body vertices in the template pose

$\hat{\mathbf{T}}_{\mu}^{S}$:Shape dataset subject shape; body vertices in the template pose

$\hat{\mathbf{T}}_{\mu}^{S}$:Mean shape of a subject in the shape dataset; body vertices in the template pose

3.Blend skinning

每個關節j繞軸的旋轉角用羅德里格斯公式轉換成旋轉矩陣：

\[\exp \left(\vec{\omega}_{j}\right)=\mathcal{I}+\widehat{\bar{\omega}}_{j} \sin \left(\left\|\vec{\omega}_{j}\right\|\right)+\widehat{\omega}_{j}^{2} \cos \left(\left\|\vec{\omega}_{j}\right\|\right) \]

其中，$\vec{\theta}=\left[\vec{\omega}_{0}^{T}, \ldots, \vec{\omega}_{K}^{T}\right]^{T}$，參數通過$|\vec{\theta}|=3 \times 23+3=72$定義

$\bar{\omega}=\frac{\vec{\omega}}{|\mid{\omega}| \mid}$:為旋轉的單位范數軸（the unit norm axis of rotation）

$\hat{\omega}$：斜對稱矩陣，通過三維向量$\bar{\omega}$組成

$\mathcal{I}$：3x3單位矩陣

4.頂點坐標計算方法

每個$\overline{\mathrm{T}}$中的頂點${\overline t _i}$被轉換成$\overline{t}_i^{'}$(都是齊次坐標系下的列向量)：

\[\begin{aligned} \overline{\mathbf{t}}_{i}^{\prime} &=\sum_{k=1}^{K} w_{k, i} G_{k}^{\prime}(\vec{\theta}, \mathbf{J}) \overline{\mathbf{t}}_{i} \\ G_{k}^{\prime}(\vec{\theta}, \mathbf{J}) &=G_{k}(\vec{\theta}, \mathbf{J}) G_{k}\left(\vec{\theta}^{*}, \mathbf{J}\right)^{-1} \\ G_{k}(\vec{\theta}, \mathbf{J}) &=\prod_{j \in A(k)}\left[\begin{array}{c|c}\exp \left(\vec{\omega}_{j}\right) & \mathbf{j}_{j} \\ \hline \overrightarrow{0} & 1\end{array}\right] \end{aligned} \]

其中，${\omega _{k,i}}$是混合權重矩陣$\mathcal{W}$的元素，代表第$k$部分的旋轉角度有多少程度影響了第$i$個頂點。

$\exp \left(\vec{\theta}_{j}\right)$為局部$3\times 3$旋轉矩陣，對應結點$j$。

$G_{k}(\vec{\theta}, \mathbf{J})$ 是關節$k$的世界變換

$G_{k}^{'}(\vec{\theta}, \mathbf{J})$是移除了變換后的相同變換（the same transformation after removing the transformation due to the rest pose.）

$\mathbf{J}$:關節回歸函數。Predict points from surface. Each 3-element vector in $J$ corresponding to a single joint center $j$, is denoted $\mathbf{j}_j$.

$A(k)$定義了關節k的有序集合。

注意，為了與現有的渲染引擎兼容，我們假設$\mathcal{W}$是稀疏的，最多允許四個部分影響一個vertex。

為了保持兼容性，我們保留了基本的皮膚函數，而是以一種附加的方式修改模板，並學習一個預測關節位置的函數。

\[\begin{array}{c} M(\vec{\beta}, \vec{\theta})=W\left(T_{P}(\vec{\beta}, \vec{\theta}), J(\vec{\beta}), \vec{\theta}, \mathcal{W}\right) \\ T_{P}(\vec{\beta}, \vec{\theta})=\overline{\mathbf{T}}+B_{S}(\vec{\beta})+B_{P}(\vec{\theta}) \end{array} \]

$B_{S}(\vec{\beta}),B_{P}(\vec{\theta})$表示由shape和pose引起的相對於SMPL標准模板的頂點向量${\overline t _i}$的偏移量：

\[\overline{\mathbf{t}}_{i}^{\prime}=\sum_{k=1}^{K} w_{k, i} G_{k}^{\prime}(\vec{\theta}, J(\vec{\beta}))\left(\overline{\mathbf{t}}_{i}+\mathbf{b}_{S, i}(\vec{\beta})+\mathbf{b}_{P, i}(\vec{\theta})\right) \]

進行進一步的細化，初始頂點不能直接用平均形狀下的頂點，還要考慮到體型與姿勢的影響，同樣關節也會因為體型而發生改變。於是得到以下擴展：

其中，$\mathbf{b}_{S, i}(\vec{\beta}), \mathbf{b}_{P, i}(\vec{\theta})$ 分別$B_{S}(\vec{\beta}),B_{P}(\vec{\theta})$的頂點，表示相對於頂點${\overline t _i}$的偏移量。關節中心是身體形狀的函數，通過混合蒙皮變形的模板網絡是姿態和形狀的函數。

5.Shape blend shapes

不同人的身體形狀可以被以下的線性函數表示：

\[B_{S}(\vec{\beta} ; \mathcal{S})=\sum_{n=1}^{|\vec{\beta}|} \beta_{n} \mathbf{S}_{n} \]

$\vec{\beta}=\left[\beta_{1}, \ldots, \beta_{|\vec{\beta}|}\right]^{T}$，$|\vec{\beta}|$是線性形狀系數的數量。

$\mathbf{S}_{n} \in \mathbb{R}^{3 N}$：形狀位移的標准正交主分量（orthonormal principal components of shape displacements )

$\mathcal{S}=\left[\mathbf{S}_{1}, \ldots, \mathbf{S}_{|\vec{\beta}|}\right] \in \mathbb{R}^{3 N \times|\vec{\beta}|}$為形狀位移矩陣。線性函數$B_{S}(\vec{\beta} ; \mathcal{S})$能夠完全被矩陣$\mathcal{S}$定義,通過注冊訓練網絡學習。

右邊的值表示學習過的參數，而左邊的值是動畫器設置的參數；為了便於標記，當學習的參數在訓練中沒有得到明確的優化時，通常忽略這些參數。

6.Pose blend shapes

定義$R：\mathbb{R}^{|\vec{\theta}|} \mapsto \mathbb{R}^{9 K}$為把一個位姿向量映射到連接部分相對旋轉矩陣的向量上$\vec{\theta}$,由於我們的骨骼（rig）有23個關節，則$R(\vec{\theta})$是一個23x9=207維的向量。它的元素是關節旋轉角的sin和cos函數，因此它是一個對於$\vec{\theta}$的非線性函數。

但是作者又定義了一個可以讓pose blend shape線性的函數：$R^{*}(\vec{\theta})=(R(\vec{\theta})-R(\vec{\theta}^{*}))$,其中，$\vec{\theta}^{*}$定義了rest pose. 定義$R_n(\vec{\theta})$為$R(\vec{\theta})$的第n個向量，則與靜止模板的偏差為：

\[B_{P}(\vec{\theta} ; \mathcal{P})=\sum_{n=1}^{9 K}\left(R_{n}(\vec{\theta})-R_{n}\left(\vec{\theta}^{*}\right)\right) \mathbf{P}_{n} \]

上式代表將姿勢帶來的形狀位移正交分解，進行計算，不過要減去休息狀態的姿勢的影響，這樣可以保障pose blend shapes在靜止狀態時貢獻為0。

其中，$\mathbf{P}_{n} \in \mathbb{R}^{3 N}$表示頂點偏移的向量。

$\mathcal{P}=\left[\mathbf{P}_{1}, \ldots, \mathbf{P}_{9 K}\right] \in \mathbb{R}^{3 N \times 9 K}$是所有207個pose blend shape組成的矩陣。$B_{P}(\vec{\theta})$完全被矩陣$\mathcal{P}$定義。

Joint locations

不同的體型有不同的關節位置，每個關節由其在靜止位姿（rest pose）中的3D位置表示。至關重要的是，這些數據必須是准確的，否則在使用皮膚化方程建立模型時將會出現偽影關節。關節3D位置相對於身體形狀的函數如下：

\[J(\vec{\beta} ; \mathcal{J}, \overline{\mathbf{T}}, \mathcal{S})=\mathcal{J}\left(\overline{\mathbf{T}}+B_{S}(\vec{\beta} ; \mathcal{S})\right) \]

其中，$\mathcal{J}$是將rest vertices轉換成rest joints 的矩陣，我們從不同的人在不同的姿勢的例子中學習回歸矩陣$\mathcal{J}$。這個矩陣models哪些網格頂點是重要的，以及如何結合它們來估計關節位置。

8.SPML模型

SMPL模型的模型參數定義為：$\Phi = \{ \overline T , \mathcal{W}, \mathcal{S}, \mathcal{J}, \mathcal{P}\}$，通過變換$\vec{\beta}, \vec{\theta}$可以得到不同的人體形狀和姿態。SMPL最后被定義為：

\[M(\vec \beta ,\vec \theta ;\Phi ) =W\left(T_{P}(\vec{\beta}, \vec{\theta} ; \overline{\mathbf{T}}, \mathcal{S}, \mathcal{P}), J(\vec{\beta} ; \mathcal{J}, \overline{\mathbf{T}}, \mathcal{S}), \vec{\theta}, \mathcal{W}\right) \]

每個頂點被轉換成：

\[\mathbf{t}_{i}^{\prime}=\sum_{k=1}^{K} w_{k, i} G_{k}^{\prime}(\vec{\theta}, J(\vec{\beta} ; \mathcal{J}, \overline{\mathbf{T}}, \mathcal{S})) \mathbf{t}_{P, i}(\vec{\beta}, \vec{\theta} ; \overline{\mathbf{T}}, \mathcal{S}, \mathcal{P}) \]

其中，

\[\mathbf{t}_{P, i}(\vec{\beta}, \vec{\theta} ; \overline{\mathbf{T}}, \mathcal{S}, \mathcal{P})=\overline{\mathbf{t}}_{i}+\sum_{m=1}^{|\vec{\beta}|} \beta_{m} \mathbf{s}_{m, i}+\sum_{n=1}^{9 K}\left(R_{n}(\vec{\theta})-R_{n}\left(\vec{\theta}^{*}\right)\right) \mathbf{p}_{n, i} \]

代表施加blend shapes后的頂點i，$\mathbf{s}_{m, i}, \mathbf{p}_{n, i} \in \mathbb{R}^{3}$是shape and pose blend shapes對應模板頂點$\bar{\mathbf{t}}_{i}$的元素.

最后，文章用LBS,DQBS去訓練參數，分別為SMPL-LBS,SMPL-DQBS,然后默認SMPL-DQBS為它們研發的好模型，因此后邊都叫SMPL

9.訓練

訓練目標是訓練$\Phi = \{ \overline T , \mathcal{W}, \mathcal{S}, \mathcal{J}, \mathcal{P}\}$ 的參數來最小化數據集上最小頂點的重建誤差。因為這個模型分解了shape和pose,因此可以分開訓練。先用多姿態數據集訓練 $\{\mathcal{W}, \mathcal{J}, \mathcal{P}\}$,再用多形狀數據集訓練 $\{ \overline T , \mathcal{S}\}$

參考：

https://blog.csdn.net/weixin_45915902/article/details/108654466

https://blog.csdn.net/qq_34296627/article/details/103158923

https://blog.csdn.net/qq_34296627/article/details/103158923?utm_medium=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control