矩陣微分


本文地址:https://www.cnblogs.com/faranten/p/16028217.html
轉載請注明作者與出處

1 分母布局與分子布局

​ ​ ​ 矩陣微分可以認為是多元微分的一種特殊形式,其中最基礎的概念是分母布局(denominator layout)分子布局(nominator layout)的概念,它決定了矩陣微分的結構。對於\(\mathbf x\in\mathbb R^{M}\)\(y=f(\mathbf x)\in\mathbb R\)而言:

\[\begin{aligned} \text{分母布局}&\quad\frac{\partial y}{\partial\mathbf x}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},\cdots,\frac{\partial y}{\partial x_M}]^T\quad\text{列向量}\\ \text{分子布局}&\quad\frac{\partial y}{\partial\mathbf x}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},\cdots,\frac{\partial y}{\partial x_M}]\quad\text{行向量} \end{aligned} \]

而對於\(x\in\mathbb R\)\(\mathbf y=f(x)\in\mathbb R^{N}\)而言:

\[\begin{aligned} \text{分母布局}&\quad\frac{\partial\mathbf y}{\partial x}=[\frac{\partial y_1}{\partial x},\frac{\partial y_2}{\partial x},\cdots,\frac{\partial y_N}{\partial x}]\quad\text{行向量}\\ \text{分子布局}&\quad\frac{\partial\mathbf y}{\partial x}=[\frac{\partial y_1}{\partial x},\frac{\partial y_2}{\partial x},\cdots,\frac{\partial y_N}{\partial x}]^T\quad\text{列向量} \end{aligned} \]

對於\(\mathbf x\in\mathbb R^M\)\(\mathbf y=f(\mathbf x)\in\mathbb R^N\)而言,其分母布局的一階導數:

\[\frac{\partial f(\mathbf x)}{\partial\mathbf x}= \left[ \begin{array} {ccc} \frac{\partial y_1}{\partial x_1}&\cdots&\frac{\partial y_N}{\partial x_1}\\ \vdots&\ddots&\vdots\\ \frac{\partial y_1}{\partial x_M}&\cdots&\frac{\partial y_N}{\partial x_M} \end{array} \right]\in\mathbb R^{M\times N} \]

稱為雅可比矩陣(Jacobian Matrix)的轉置(因為雅可比矩陣采用分子布局)。對於\(\mathbf x\in\mathbb R^{M}\)\(y=f(\mathbf x)\in\mathbb R\)而言,其分母布局的二階導數:

\[\mathbf H=\frac{\partial^2f(\mathbf x)}{\partial\mathbf x^2}= \left[ \begin{array} {ccc} \frac{\partial^2 y}{\partial x_1^2}&\cdots&\frac{\partial y}{\partial x_1x_M}\\ \vdots&\ddots&\vdots\\ \frac{\partial^2 y}{\partial x_Mx_1}&\cdots&\frac{\partial^2 y}{\partial x_M^2} \end{array} \right]\in\mathbb R^{M\times M} \]

稱為函數\(f(\mathbf x)\)Hessian矩陣,也寫作\(\nabla^2f(\mathbf x)\),其中第\(m,n\)個元素為\(\frac{\partial^2y}{\partial x_mx_n}\)

2 導數法則

2.1 加減法則

​ ​ ​ 對於\(\mathbf x\in\mathbb R^{M}\)\(\mathbf y=f(\mathbf x)\in\mathbb R^N\)\(\mathbf z=g(\mathbf x)\in\mathbb R^N\),則

\[\frac{\partial(\mathbf y+\mathbf z)}{\partial\mathbf x}=\frac{\partial\mathbf y}{\partial\mathbf x}+\frac{\partial\mathbf z}{\partial\mathbf x} \]

2.2 乘法法則

​ ​ ​ 對於\(\mathbf x\in\mathbb R^M\)\(\mathbf y=f(\mathbf x)\in\mathbb R^N\)\(\mathbf z=g(\mathbf x)\in\mathbb R^N\),則

\[\frac{\partial\mathbf y^T\mathbf z}{\partial\mathbf x}=\frac{\partial\mathbf y}{\partial\mathbf x}\mathbf z+\frac{\partial\mathbf z}{\partial\mathbf x}\mathbf y\quad\in\mathbb R^M \]

​ ​ ​ 對於\(\mathbf x\in\mathbb R^M\)\(\mathbf y=f(\mathbf x)\in\mathbb R^S\)\(\mathbf z=g(\mathbf x)\in\mathbb R^T\)\(\mathbf A\in\mathbb R^{S\times T}\),則

\[\frac{\partial\mathbf y^T\mathbf A\mathbf z}{\partial\mathbf x}=\frac{\partial\mathbf y}{\partial\mathbf x}\mathbf A\mathbf z+\frac{\partial\mathbf z}{\partial\mathbf x}\mathbf A^T\mathbf y\quad\in\mathbb R^M \]

​ ​ ​ 對於\(\mathbf x\in\mathbb R^M\)\(y=f(\mathbf x)\in\mathbb R\)\(\mathbf z=g(\mathbf x)\in\mathbb R^N\),則

\[\frac{\partial y\mathbf z}{\partial\mathbf x}=y\frac{\partial\mathbf z}{\partial\mathbf x}+\frac{\partial y}{\partial\mathbf x}\mathbf z^T\quad\in\mathbb R^{M\times N} \]

2.3 鏈式法則

​ ​ ​ 在形式上和普通的鏈式法則一樣。

3 完整定義

1.2

可以看出分母布局和分子布局的區別僅在於轉置。

3.1 圖1:\(\partial\text{向量}/\partial\text{向量}\)

1.2

3.2 圖2:\(\partial\text{標量}/\partial\text{向量}\)

1.2
1.2

3.3 圖3:\(\partial\text{向量}/\partial\text{標量}\)

1.2

3.4 圖4:\(\partial\text{標量}/\partial\text{矩陣}\)

1.2
1.2
1.2

3.5 圖5:\(\partial\text{矩陣}/\partial\text{標量}\)

1.2

3.6 圖6:\(\partial\text{標量}/\partial\text{標量}\)鏈式法則結合矩陣

1.2

3.7 圖7:\(\partial\text{標量}/\partial\text{標量}\)鏈式法則結合矩陣

1.2

3.8 圖8:\(d(\text{矩陣})\)

1.2

3.9 圖9:\(d(\text{矩陣})\)

1.2

3.10 圖10:\(d/d\)形式

1.2

4 參考資料

image


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM