一、基本概念与性质
记号规范请参考:记号规范
1. 迹
对称矩阵\(A\)的迹定义为:
\[Tr(A) = \sum_{i=1}^nA_i^i \tag{1.1} \]
2. 迹的运算
(1)
\[Tr(A) = \sum_{i=1}^n\lambda_{i} \tag{1.2.1} \]
其中\(\lambda_i\)为矩阵\(A\)的第\(i\)个特征值
(2)
\[Tr(A) = Tr(A^T) \tag{1.2.2} \]
(3)
\[Tr(AB) = \sum_{i=1}^n\left(\sum_{j=1}^nA_i^jB_j^i\right) = \sum_{j=1}^n\left(\sum_{i=1}^nB_j^iA_i^j\right) = Tr(BA) \tag{1.2.3} \]
(4)
\[Tr(A + B) = Tr(A) + Tr(B) \tag{1.2.4} \]
(5)
\[Tr(\mathbf{x}\mathbf{x}^T) = \sum_{i=1}^n\mathbf{x}_i\cdot \mathbf{x}_i = \mathbf{x}^T\mathbf{x} \tag{1.2.5} \]
3. 行列式
对称矩阵\(A\)的行列式定义为:
\[\det (A) = \sum_{\sigma \in S_n}(-1)^{\mathrm{sgn}(\sigma)}\prod_{i=1}^n A_i^{\sigma(i)} \tag{1.3.1} \]
其中\(S_n\)是集合\(\{1, 2, \cdots, n\}\)上置换的全体,即集合\(\{1, 2, \cdots, n\}\)到自身的一一映射(双射)的全体;
例如:\(\{2, 3, 1\}\)是\(\{1, 3, 2\}\)的置换,且满足\(\sigma(1) = 2, \sigma(2) = 3, \sigma(3) = 1\)
其中\({\rm sgn} (\sigma)\)表示的是置换\(\sigma\)中逆序对(即\(\sigma(i) > \sigma(j),1 \leq i \leq j \leq n\))的数量;
例如:\({\rm sgn}(\{2, 3, 1\}) = 2\)
对于有\(n\)个元素的集合而言,其置换的个数有\(n!\)个
4. 行列式的计算
(1)
\[\det (A) = \prod_{i=1}^n \lambda_i \tag{1.4.1} \]
其中\(S_n\)是集合\(\{1, 2, \cdots, n\}\)上置换的全体,即集合\(\{1, 2, \cdots, n\}\)到自身的一一映射(双射)的全体;
(2)
\[\det(A) \overset{按行展开}{=} \sum_{j=1}^n(-1)^{i + j}A_i^{j}\det\left([A]_i^{j}\right) \overset{按列展开}{=} \sum_{i=1}^n(-1)^{i + j}A_i^{j}\det\left([A]_i^{j}\right) \tag{1.3.2} \]
(3)
\[\det(kA) = k^n\det(A) \tag{1.3.3} \]
(4)
\[\det(A^T) = \det(A) \tag{1.3.4} \]
(5)
\[\det(AB) = \det(A)\det(B) \tag{1.3.5} \]
(6)
\[\det(A^{-1}) = \frac{1}{\det(A)} \tag{1.3.6} \]
(7)
\[\begin{align} \det(I + \mathbf{u} \mathbf{v}^T) &= 1 + \mathbf{u}^T\mathbf{v} \tag{1.3.7} \end{align} \]
(8)
\[\mathrm{adj}(A) = \det(A)\cdot A^{-1} \tag{1.3.8} \]
二、向量与矩阵的运算结论
1. 矩阵相乘
(1)
\[\begin{align} A\cdot B &= \left((AB)_i^j\right)_{m\times n} \\ &= \left(\sum_k A_i^kB_k^j\right)_{m\times n} \end{align} \tag{2.1.1} \]
(2)
\[\begin{align} (A\cdot B)\cdot C &= \left(\sum_k(AB)_i^kC_k^j\right)_{m\times n}\\ &= \left(\sum_k\left(\sum_tA_i^tB_t^k\right)C_k^j \right)_{m\times n} \end{align} \tag{2.1.2} \]
(3)
\[A\cdot [E_i^j] = \left(0, \cdots \underbrace{A^i}_{第j列},\cdots ,0 \right) \tag{2.1.3} = [A^i]^j \]
(4)
\[[E_i^j]\cdot A = \left(\begin{array}{cc} &0\\ &\vdots\\ 第i行\left\{\right. &A_j\\ &\vdots \\ &0 \end{array} \right) = [A_j]_i \tag{2.1.4} \]
三、向量、矩阵求导
1. 求导布局

-
分子布局:求导结果的第一维度以分子为主
-
分母布局:求导结果的第一维度以分母为主
例如:\(m\)维列向量\(\mathbf{y}\)对于\(\mathbf{x}\)求导,若
\[\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left( \begin{matrix} \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_n} \\ \vdots&\ddots &\vdots \\ \frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_n} \end{matrix} \right) \\ \]
\[\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left( \begin{matrix} \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_1} \\ \vdots &\ddots &\vdots \\ \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_n} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_n} \end{matrix} \right) \\ \]
注:以下所有求导结果均以分子布局为基础(若分子为标量,则为分母布局)
2. 求偏微分法则
\[\partial C = 0 (C为常(矩阵、向量、标量)) \tag{3.2.1} \]
\[\partial A^T = (\partial A)^T \tag{3.2.2} \]
\[\partial (A + B) = \partial A + \partial B \tag{3.2.3} \]
\[\partial (AB) = \partial A\cdot B + A\cdot \partial B \tag{3.2.4} \]
\[\partial (A\odot B) = \partial A\odot B + A\odot \partial B \tag{3.2.5} \]
\[\partial( A\otimes B) = \partial A\otimes B +A\otimes \partial B \tag{3.2.6} \]
\[\partial ({A^{-1}}) = -A^{-1}\cdot \partial A\cdot A^{-1} \tag{3.2.7} \]
\[\partial\ Tr(A) = Tr(\partial A) \tag{3.2.8} \]
\[\partial \mathrm{det}A = Tr(\mathrm{adj}A \cdot \partial A) = \mathrm{detA}\cdot Tr(A^{-1} \partial A) \tag{3.2.9} \]
链式求导法则:
\[\partial g\circ f(A) = \sum_k\sum_t \frac{\partial g\circ f(A)}{\partial f(A)_k^t}\cdot \partial f(A)_k^t = Tr\left(\left(\frac{\partial g\circ f(A)}{\partial f(A)}\right)^T\cdot \partial f(A)\right) \tag{3.2.10} \]
3. 向量求导
(1)
\[\frac{\partial \mathbf{x}}{\partial x} = \left( \begin{array}{cc} \frac{\mathrm{d}\mathbf{x}_1}{\mathrm{d}x} \\ \vdots\\ \frac{\mathrm{d}\mathbf{x}_m}{\mathrm{d}x} \end{array} \right) \tag{3.3.1} \]
(2)
\[\frac{\partial \mathbf{x}^T}{\partial x} = \left(\frac{\partial \mathbf{x}}{\partial x}\right)^T \tag{3.3.2} \]
(3)
\[\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \frac{\partial \mathbf{y}}{\partial \mathbf{x^T}} = \left( \begin{matrix} \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_n} \\ \vdots &\ddots &\vdots \\ \frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_n} \end{matrix} \right) \tag{3.3.3}\]
(4)
\[\frac{\partial \mathbf{y}^T}{\partial \mathbf{x}} =\frac{\partial \mathbf{y}^T}{\partial \mathbf{x}^T} = \left( \frac{\partial \mathbf{y}}{\partial \mathbf{x}} \right)^T \tag{3.3.4} \]
(5)
\[\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial\mathbf{x}} = \left(\begin{array}{cc} \mathbf{y}_1 \\ \vdots \\ \mathbf{y}_n \end{array} \right) = \mathbf{y} \tag{3.3.5} \]
(6)
\[\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}^T} = \left( \frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}} \right)^T \tag{3.3.6} \]
(7)
\[\frac{\partial A\mathbf{x}}{\partial\mathbf{x}} = \frac{\partial A\mathbf{x}}{\partial\mathbf{x}^T} = \left( \begin{array}{cc} A_{1}^1 &\cdots &A_{1}^m \\ \vdots & \ddots &\vdots \\ A_{n}^1 &\cdots &A_{n}^m \\ \end{array} \right) = A \tag{3.3.7}\]
(8)
\[\frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}} = \frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}^T} = (A + A^T)\mathbf{x} \tag{3.3.8} \]
4. 矩阵求导
(1)
\[\frac{\partial \mathbf{x}^TA\mathbf{y}}{\partial A} = \mathbf{x}\mathbf{y}^T \tag{3.4.1} \]
(2)
\[\frac{\partial \mathbf{x}^TA^T\mathbf{y}}{\partial A} = \mathbf{y}\mathbf{x}^T \tag{3.4.2} \]
(3)
\[\frac{\partial \mathbf{x}^TA^TA\mathbf{y}}{\partial A} = A(\mathbf{y}\mathbf{x}^T + \mathbf{x}\mathbf{y}^T ) \tag{3.4.3} \]
展开证明
$$\begin{align*} \frac{\partial \mathbf{x}^TA^TA\mathbf{y}}{\partial A} &= \left(\frac{\partial \mathbf{x}^TA^TA\mathbf{y}}{\partial A_i^j} \right) _{m\times n} \\ &= \left(\frac{\partial \sum_p(\sum_q A_p^q \mathbf{x}_q)(\sum_qA_p^q\mathbf{y}_q)}{\partial A_i^j}\right)_{m\times n}\\ &= \left(\frac{\partial \left(A_i^j\mathbf{x}_j(\sum_qA_i^q\mathbf{y}_q) + A_i^j\mathbf{y}_j(\sum_qA_i^q\mathbf{x}_q) \right)}{\partial A_i^j}\right)_{m\times n} \\ &= \left((\sum_qA_i^q\mathbf{x}_j\mathbf{y}_q) + (\sum_qA_i^q\mathbf{y}_j\mathbf{x}_q) \right)_{m\times n} \\ &= \left(\sum_q A_i^q (\mathbf{y}\mathbf{x}^T)^j_q + \sum_q A_i^q (\mathbf{x}\mathbf{y}^T)^j_q \right)_{m\times n} \\ &= A(\mathbf{y}\mathbf{x}^T + \mathbf{x}\mathbf{y}^T ) \end{align*} $$
(4)
\[\frac{\partial A^TBA}{\partial B_{i}^{j}} = A_i^TA_j \tag{3.4.4} \]
展开证明
$$\begin{align*} \frac{\partial A^TBA}{\partial B_{i}^{j}} &= \left(\frac{\partial (A^TBA)_p^q }{\partial B_i^j}\right)_{n\times n} \\ &= \left( \frac{\partial \sum_k(\sum_t A_t^pB_t^k)A_k^q}{\partial B_i^j} \right)_{n\times n} \\ &= \left( \begin{matrix} A_i^1A_j^1, &A_i^1A_j^2, &\cdots, &A_i^1A_j^n\\ A_i^2 A_j^1, &A_i^2A_j^2, &\cdots, &A_i^2A_j^n\\ \vdots, &\vdots, &\ddots, &\vdots \\ A_i^n A_j^1, &A_i^nA_j^2, &\cdots, &A_i^nA_j^n \end{matrix} \right) \\ &= A_i^TA_j \end{align*}$$
(5)
\[\frac{\partial A^TBA}{\partial A_{i}^j} = [E_j^i]\cdot (BA) + (A^TB)\cdot [E_i^j] \tag{3.4.5} \]
展开证明
$$\begin{align*} \frac{\partial A^TBA}{\partial A_{i}^j} &= \left( \frac{\partial(A^TBA)_p^q}{\partial A_i^j} \right)_{n\times n} \\ &=\left( \frac{\partial(\sum_k(\sum_tA_t^pB_t^k)A_k^q)}{\partial A_i^j} \right)_{n\times n} \\ &= \left(\frac{\partial(\delta(p, j)\cdot \sum_{k}A_i^j B_i^kA_k^q +\delta(q, j)\cdot \sum_t A_t^pB_t^iA_i^j)}{\partial A_i^j}\right)_{n\times n}\\ &= \left(\delta(p, j)\cdot\sum_kB_i^kA_k^q + \delta(q, j)\cdot\sum_tA_t^p B_t^i\right)_{n\times n}\\ &= \left(\delta(p, j)\cdot(BA)_i^q + \delta(q, j)\cdot(A^TB)_p^i\right)_{n\times n}\\ &= \left(\begin{array}{cc} \delta(1, j)\\ \delta(2, j)\\ \vdots \\ \delta(n, j) \end{array}\right)\cdot (BA)_i + (A^TB)^i\cdot (\delta(1, j), \delta(2, j), \cdots, \delta(n, j)) \\ &= I^j\cdot I_i\cdot (BA) + (A^TB)\cdot I^i \cdot I_j \\ &= [E_j^i]\cdot (BA) + (A^TB)\cdot [E_i^j] \end{align*}$$
可简记为:\(\frac{\partial A^TBA}{\partial A_i^j} = \frac{\partial A^T}{\partial A_i^j}\cdot BA + A^TB\cdot \frac{\partial A}{\partial A_i^j}\)
(6)
\[\frac{\partial \mathbf{y}^TA^TBA\mathbf{z}}{\partial A} = B^TA\mathbf{y}\mathbf{z}^T + BA\mathbf{z}\mathbf{y}^T \tag{3.4.6} \]
展开证明
\begin{align*} \frac{\partial \mathbf{y}^TA^TBA\mathbf{z}}{\partial A} &\overset{}{=} \mathbf{y}^T\left(\frac{\partial A^TBA}{\partial A_i^j} \right)_{m\times n} \mathbf{z}\\ &\overset{\rm 由(3.4.5)}{=} \left(\mathbf{y}^T\left( A^TB[E_i^j] + [E_j^i]BA\right)\mathbf{z} \right)_{m\times n} \\ &= \left( \mathbf{y}^T[(A^TB)^i]^j\mathbf{z} \right)_{m\times n} + \left( \mathbf{y}^T[(BA)_i]_j\mathbf{z} \right)_{m\times n} \\ &= \left([\mathbf{y}^T\cdot (A^TB)^i]^j\mathbf{z} \right)_{m\times n} + \left(\mathbf{y}^T[(BA)_i\cdot \mathbf{z}]_j \right)_{m\times n}\\ &= \left(\sum_k\mathbf{y}_k(A^TB)_k^i\cdot \mathbf{z}_j\right)_{m\times n} + \left(\mathbf{y}_j\cdot \sum_k(BA)_i^k\mathbf{z}_k\right)_{m\times n} \\ &= \left(\mathbf{y}^T\cdot (A^TB)\right)^T\cdot\mathbf{z}^T + \left( (BA)\cdot \mathbf{z}\cdot \mathbf{y}^T\right) \\ &= B^TA\mathbf{y}\mathbf{z}^T + BA\mathbf{z}\mathbf{y}^T \end{align*}
(7)
\[\frac{\partial }{\partial A}(A\mathbf{x} + \mathbf{y})^TD(A\mathbf{x} + \mathbf{y}) = (D + D^T)(A\mathbf{x} + \mathbf{y})\mathbf{x}^T \tag{3.4.7} \]
展开证明
$$\begin{align*} \frac{\partial }{\partial A}(A\mathbf{x} + \mathbf{y})^TD(A\mathbf{x} + \mathbf{y}) &= \left(\frac{\partial }{\partial A_i^j}(A\mathbf{x} + \mathbf{y})^TD(A\mathbf{x} + \mathbf{y}) \right)_{m\times n} \\ & \overset{\rm 由链式法则}{=} \left(Tr\left(\left[\left.\frac{\partial\mathbf{z^T}D\mathbf{z}}{\partial \mathbf{z}}\right|_{\mathbf{z} = A\mathbf{x} + \mathbf{y}}\right]^T\frac{\partial (A\mathbf{x} + \mathbf{y})}{\partial A_i^j}\right)\right)_{m\times n}\\ &= \left(Tr\left(\mathbf{z}^T(D + D^T)\frac{\partial(A\mathbf{x} + \mathbf{y})}{\partial A_i^j}\right)\right)_{m\times n} \\ &= \left(Tr\left(\left.\frac{\partial (\mathbf{w}^TA\mathbf{x} + \mathbf{w}^T\mathbf{y})}{\partial A_i^j}\right|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \left(Tr\left(\mathbf{w}^T[E_i^j]\mathbf{x}|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \left(Tr\left([\mathbf{w}_i]_j\cdot \mathbf{x}|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \left(Tr\left(\mathbf{w}_i\cdot\mathbf{x}_j|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \mathbf{w}\cdot \mathbf{x}^T \\ &= (D + D^T)(A\mathbf{x} + \mathbf{y})\mathbf{x}^T \end{align*}$$
5. 行列式求导
(1)
\[\frac{\partial \det(Y)}{\partial x} = \det(Y)\cdot Tr(Y^{-1}\frac{\partial Y}{\partial x}) \]
展开证明
$$\begin{align*} \frac{\partial \det(Y)}{\partial x} &= Tr\left(\left(\frac{\partial \det(Y)}{\partial Y}\right)^T\cdot \frac{\partial Y}{\partial x}\right) \\ &= Tr\left(\left(\frac{\partial \sum_{k=1}^n (-1)^{i+k}Y_{i}^k\det([Y]_i^k)}{\partial Y_i^j}\right)^T_{m\times n}\cdot \frac{\partial Y}{\partial x} \right) \\ &= Tr\left(\left((-1)^{i+j}\det\left([Y]_i^j\right)\right)^T_{m\times n}\cdot \frac{\partial Y}{\partial x}\right) \\ &= Tr\left(\left(\mathrm{cof}(Y)\right)^T\cdot \frac{\partial Y}{\partial x}\right) \\ &= Tr\left(\mathrm{adj}(Y)\cdot \frac{\partial Y}{\partial x}\right) \\ &= \det(Y)\cdot Tr(Y^{-1}\frac{\partial Y}{\partial x}) \end{align*}$$
(2)
\[\frac{\partial \det(A)}{\partial A} = \det(A)\cdot \left(A^{-1}\right)^T \tag{3.5.2} \]
展开证明
\begin{align*} \frac{\partial \det(A)}{\partial A} &= \left(\frac{\partial \det(A)}{\partial A_i^j} \right)_{n\times n} \\ &= \left(\det(A)\cdot Tr\left(A^{-1}\cdot \frac{\partial A}{\partial A_i^j}\right)\right)_{n\times n} \\ &= \left(\det (A)\cdot Tr\left(A^{-1}\cdot [A_i^j]\right)\right)_{n\times n} \\ &= \left(\det(A)\cdot Tr\left([(A^{-1})^i]^j\right)\right)_{n\times n}\\ &= \left(\det (A)\cdot \left(A^{-1}\right)_j^i\right)_{n\times n}\\ &= \det(A)\cdot \left(A^{-1}\right)^T \end{align*}
(3)
\[\frac{\partial \det(X^TAX)}{\partial X} = \det(X^TAX)\cdot\left(AX(X^TAX)^{-1} + A^TX(X^TA^TX)^{-1} \right) \tag{3.5.3} \]
展开证明
\begin{align*} \frac{\partial \det(X^TAX)}{\partial X} &= \left(Tr\left(\left(\frac{\partial\det(X^TAX)}{\partial X^TAX}\right)^T\cdot \frac{\partial X^TAX}{\partial X_i^j}\right)\right)_{m\times n} \\ &= \det(X^TAX)\cdot \left(Tr\left((X^TAX)^{-1}\cdot \frac{\partial X^TAX}{\partial X_i^j}\right)\right)_{m\times n} \\ &= \det(X^TAX)\cdot \left(Tr\left((X^TAX)^{-1}\cdot [E_j^i]\cdot AX + (X^TAX)^{-1}\cdot X^TA\cdot [E_i^j]\right) \right)_{m\times n}\\ &= \det(X^TAX)\cdot \left(Tr\left([((X^TAX)^{-1})^j]^i\cdot AX\right) + Tr\left((X^TAX)^{-1}\cdot [(X^TA)^i]^j\right)\right)_{m\times n} \\ &= \det(X^TAX)\cdot \left(\sum_k((X^TAX)^{-1})^j_k\cdot (AX)_i^k\right)_{m\times n} + \det(X^TAX)\left(\sum_{k}((X^TAX)^{-1})_j^k\cdot (X^TA)_k^i\right)_{m\times n} \\ &= \det(X^TAX)\left(\left((AX)\cdot(X^TAX)^{-1}\right)_{i}^j\right)_{m\times n} + \det(X^TAX)\left(\left((A^TX)\cdot(X^TAX)^{-T}\right)_{i}^j\right)_{m\times n} \\ &= \det(X^TAX)\cdot\left(AX(X^TAX)^{-1} + A^TX(X^TA^TX)^{-1} \right) \end{align*}
(4)
\[\frac{\partial \ln \det(X^TX)}{\partial X}= 2(X^{L+})^T \tag{3.5.4} \]
展开证明
$$\begin{align*} \frac{\partial \ln \det(X^TX)}{\partial X} &= \frac{1}{\det(X^TX)}\cdot \left(Tr\left(\frac{\partial \det(X^TX)}{\partial X_{i}^j}\right)\right)_{m\times n} \\ &= \frac{\det(X^TX)}{\det(X^TX)}\cdot \left(Tr\left(2\sum_k X_i^k\left((X^TX)^{-1}\right)_k^j \right)\right)_{m\times n} \\ &= 2\left(\sum_k X_i^k\left((X^TX)^{-1}\right)_k^j \right)_{m\times n} \\ &= 2X(X^TX)^{-1} \\ &= 2(X^{L+})^T \end{align*}$$
6. 矩阵逆的求导
(1)
\[\frac{\partial Y^{-1}}{\partial x} = -Y^{-1}\frac{\partial Y}{\partial x}Y^{-1} \tag{3.6.1} \]
展开证明
$$\begin{align*} \frac{\partial Y^{-1}}{\partial x} &= Y^{-1}\frac{\partial (Y\cdot Y^{-1}) - \partial(Y)\cdot Y^{-1} }{\partial x} \\ &= -Y^{-1}\frac{\partial Y}{\partial x}Y^{-1} \end{align*}$$
(2)
\[\frac{\partial \mathbf{a}^TX^{-1}\mathbf{b}}{\partial X} = X^{-T}\mathbf{a}\mathbf{b}^TX^{-T} \tag{3.6.2} \]
展开证明
$$\begin{align*} \frac{\partial \mathbf{a}^TX^{-1}\mathbf{b}}{\partial X} &= \left(\frac{\partial \mathbf{a}^{T}X^{-1}\mathbf{b}}{\partial X_i^j}\right)_{m\times n} \\ &= \left(\mathbf{a}^T X^{-1}[X_i^j]X^{-1}\mathbf{b}\right)_{m\times n}\\ &= \left(\mathbf{a}^TX^{-1} I^i\cdot I_jX^{-1}\mathbf{b}\right)_{m\times n}\\ &= \left(\mathbf{a}^T(X^{-1})^{i}\cdot (X^{-1})_j\mathbf{b}\right)_{m\times n}\\ &= \left((X^{-1})_j\mathbf{b}\cdot \mathbf{a}^T(X^{-1})^i\right)_{m\times n} \\ &= \left(X^{-1}\mathbf{b}\mathbf{a}^TX^{-1}\right)^{T}\\ &= X^{-T}\mathbf{a}\mathbf{b}^TX^{-T} \end{align*}$$
(3)
\[\frac{\partial \det(X^{-1})}{\partial X} = \det(X^{-1})(X^{-1})^T \tag{3.6.3} \]
展开证明
$$\begin{align*} \frac{\partial \det(X^{-1})}{\partial X} &= \left(Tr\left(\frac{\partial \det(X^{-1})}{\partial X^{-1}}\right)^T\cdot \frac{\partial X^{-1}}{\partial X_i^j}\right)_{n\times n} \\ &= \det(X^{-1})\left(X\cdot X^{-1}\frac{\partial X}{\partial X_i^j}X^{-1}\right)_{n\times n}\\ &= \det(X^{-1})\left([E_i^i]X^{-1}\right)_{n\times n}\\ &= \det(X^{-1})\left(\left(X^{-1}\right)^i_j\right)_{m\times n}\\ &= \det(X^{-1})(X^{-1})^T \end{align*}$$
(4)
\[\frac{\partial Tr(AX^{-1}B)}{\partial X} = \left(X^{-1}BAX^{-1}\right)^{T} \tag{3.6.4} \]
展开证明
$$\begin{align*} \frac{\partial Tr(AX^{-1}B)}{\partial X} \overset{\left(AX^{-1}B\right)_i^i = A_iX^{-1}B^{i}}{==========} &\sum_i \frac{\partial A_i X^{-1}B^{i}}{\partial X} \\ =&\sum_i X^{-T}(A^T)^i (B^T)_{i}X^{-T} \\ \overset{\sum_iA^iB_i = AB}{========}& X^{-T}A^TB^TX^{-T} \\ =& \left(X^{-1}BAX^{-1}\right)^{T} \end{align*}$$
(5)
\[\begin{align} \frac{\partial Tr\left((X+A)^{-1}\right) }{\partial X} &\overset{由3.6.4}{=}((X+A)^{-1}(X+A)^{-1})^T \end{align} \tag{3.6.5} \]
7. 迹的求导
(1)
\[\frac{\partial Tr(X)}{\partial X} = I \tag{3.7.1} \]
展开证明
$$\begin{align*} \frac{\partial Tr(X)}{\partial X} &= \left(\frac{\partial \sum_k X_k^k}{\partial X_i^j}\right)_{n\times n} \\ &= \left(\delta_i^j\right)_{n\times n} \\ &= I \end{align*}$$
(2)
\[\frac{\partial Tr(XA)}{\partial X} = A^T \tag{3.7.2} \]
展开证明
$$\begin{align*} \frac{\partial Tr(XA)}{\partial X} &= \left(\frac{\sum_k\sum_t X_k^tA_t^k}{\partial X_i^j}\right)_{m\times n} \\ &= \left(A_j^i\right)_{m\times n} \\ &= A^T \end{align*}$$
(3)
\[\frac{\partial Tr(AXB)}{\partial X} = A^TB^T \tag{3.7.3} \]
展开证明
$$\begin{align*} \frac{\partial Tr(AXB)}{\partial X} &= \left(\frac{\partial \sum_k A_kXB^k}{\partial X_i^j}\right)_{m\times n} \\ &= \left(A_k^iB_j^k\right)_{m\times n} \\ &= A^TB^T \end{align*}$$
(4)
\[\frac{\partial Tr(A \otimes X)}{\partial X} = Tr(A)I \tag{3.7.4} \]
展开证明
$$\begin{align*} \frac{\partial Tr(A \otimes X)}{\partial X} &= \left(\frac{\partial \sum_k A_k^k X_k^k}{\partial X_i^j}\right)_{n\times n} \\ &= \left(\sum_k A_k^k\delta_i^j\right)_{n\times n}\\ &= Tr(A)I \end{align*}$$