矩陣的跡的定義:一個 $n \times n$ 的矩陣 A 的跡是指 A 的主對角線上各元素的總和,記作 $\operatorname{tr}(A)$ 。即
$\operatorname{tr}(A)=\sum\limits\limits _{i=1}^{n} a_{i i}$
定理1:
$\operatorname{tr}(A B)=\operatorname{tr}(B A) $
證明:
$\operatorname{tr}(A B)=\sum\limits_{i=1}^{n}(A B)_{i i}=\sum\limits_{i=1}^{n} \sum\limits_{j=1}^{m} A_{i j} B_{j i}=\sum\limits_{j=1}^{m} \sum\limits_{i=1}^{n} B_{j i} A_{i j}=\sum\limits_{j=1}^{m}(B A)_{j j}=\operatorname{tr}(B A) $
定理2:
$\operatorname{tr}(A B C)=\operatorname{tr}(C A B)=\operatorname{tr}(B C A) $
證明:
把 $\mathrm{AB}$ 或者 $ \mathrm{BC}$ 當作整體, 由定理 1 可以知道成立
定理3:
$\frac{\partial \operatorname{tr}(A B)}{\partial A}=\frac{\partial \operatorname{tr}(B A)}{\partial A}=B^{T} $
其中 $A$ 是 $m \times n$ 的矩陣, $B$ 是 $n \times m$ 的矩陣
證明:
$\operatorname{tr}(A B)=\operatorname{tr}\left(\begin{array}{cccc}a_{11} & a_{12} & \cdots & a_{1 n} \\a_{21} & a_{22} & \cdots & a_{2 n} \\\vdots & \vdots & \ddots & \vdots \\a_{m 1} & a_{m 2} & \cdots & a_{m n}\end{array}\right)\left(\begin{array}{cccc}b_{11} & b_{12} & \cdots & b_{1 m} \\b_{21} & b_{22} & \cdots & b_{2 m} \\\vdots & \vdots & \ddots & \vdots \\b_{n 1} & b_{n 2} & \cdots & b_{n m}\end{array}\right)$
只考慮對角線上的元素, 那么有
$\operatorname{tr}(A B)=\sum\limits_{i=1}^{n} a_{1 i} b_{i 1}+\sum\limits_{i=1}^{n} a_{2 i} b_{i 2}+\ldots+\sum\limits_{i=1}^{n} a_{m i} b_{i m}=\sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n} a_{i j} b_{j i}$
$\frac{\partial \operatorname{tr}(A B)}{\partial a_{i j}}=b_{j i} \Rightarrow \frac{\partial \operatorname{tr}(A B)}{\partial A}=B^{T}$
定理4:
$\frac{\partial \operatorname{tr}\left(A^{T} B\right)}{\partial A}=\frac{\partial \operatorname{tr}\left(B A^{T}\right)}{\partial A}=B$
證明:
證明步驟跟定理 3 一樣, 很容易, 不再贅述。
定理5:
$\operatorname{tr}(A)=\operatorname{tr}\left(A^{T}\right) $
證明:
略。
定理6:
如果 $a \in R$ , 那么有 $\operatorname{tr}(a)=a $
證明:
當作 $1 \times 1$ 的矩陣處理即可。
定理7:
$\frac{\partial \operatorname{tr}\left(A B A^{T} C\right)}{\partial A}=C A B+C^{T} A B^{T} $
證明: 分步求導, 得到如下表達式
$\begin{aligned}\frac{\partial \operatorname{tr}\left(A B A^{T} C\right)}{\partial A} &=\frac{\partial \operatorname{tr}\left(A B A^{T} C\right)}{\partial A}+\frac{\partial \operatorname{tr}\left(A^{T} C A B\right)}{\partial A}\quad\quad(分步求導,定理1) \\&=\left(B A^{T} C\right)^{T}+C A B\quad\quad(定理1、定理4) \\&=C A B+C^{T} A B^{T}\end{aligned}$
例子:
$\begin{array}{l}\operatorname{tr}(A)=\sum_{i=1}^{n} a_{i i} \\\operatorname{tr}(A B C)=\operatorname{tr}(B C A)=\operatorname{tr}(C A B) \\ \operatorname{tr}(A B)=\operatorname{tr}(B A) \\\frac{\partial \operatorname{tr}(A B)}{\partial A}=\frac{\partial \operatorname{tr}(B A)}{\partial A}=B^{T}\\\operatorname{tr}(A)=\operatorname{tr}\left(A^{T}\right) \\ \frac{\partial \operatorname{tr}\left(A^{T} B A\right)}{\partial A}=B A+B^{T} A\frac{\partial \operatorname{tr}\left(A X B X C^{T}\right)}{\partial X}=A^{T} C X^{T} B^{T}+B^{T} X^{T} A^{T} C \\ \frac{\partial\operatorname{tr}\left(A B A^{T}\right)}{\partial A}=A B+A B^{T} \\\frac{\partial \operatorname{tr}(A X B X)}{\partial X}=A^{T} X^{T} B^{T}+B^{T} X^{T} A^{T} \\ \frac{\partial \operatorname{tr}\left(A X B X^{T}\right)}{\partial X}=A X B+A^{T} X B^{T}\\\frac{\partial \operatorname{tr}\left(A^{T} B\right)}{\partial A}=\frac{\partial \operatorname{tr}\left(B A^{T}\right)}{\partial A}=B \\\frac{\partial\operatorname{tr}\left(A^{T} X B^{T}\right)}{\partial X}=\frac{\partial \operatorname{tr}\left(A X^{T} B\right)}{\partial X}=A B\end{array}$
向量的L2范數求導
回歸中最為基礎的方法, 最小二乘法.
$J_{L S}(\theta)=\frac{1}{2}\|A \vec{x}-\vec{b}\|^{2}$
向量的范數定義
$\begin{array}{l}\vec{x} =\left[x_{1}, \cdots, x_{n}\right]^{\mathrm{T}} \\\|\vec{x}\|_{p} =\left(\sum_{i=1}^{m}\left|x_{i}\right|^{p}\right)^{\frac{1}{p}}, p<+\infty\end{array}$
$L_{2}$ 范數具體為
$\|\vec{x}\|_{2}=\left(\left|x_{1}\right|^{2}+\cdots+\left|x_{m}\right|^{2}\right)^{\frac{1}{2}}=\sqrt{\vec{x}^{\mathrm{T}} \vec{x}}$
矩陣求導
采用列向量形式定義的偏導算子稱為列向量偏導算子,習慣稱為梯度算子, $\mathrm{n} \times 1$ 列向量偏導算子即梯度算子記作 $\nabla_{x}$ ,定義為
$\nabla_{x}=\frac{\partial}{\partial x}=\left[\frac{\partial}{\partial x_{1}}, \cdots, \frac{\partial}{\partial x_{m}}\right]^{\mathrm{T}}$
如果 $\vec{x}$ 是一個 $n \times 1$ 的列向量,那么
$\begin{array}{l}\frac{\partial y x}{\partial x}=y^{T} \\ \frac{\partial\left(x^{T} A x\right)}{\partial x}=\left(A+A^{T}\right) x\end{array}$
通過以上准備, 我們下面進行求解
$\begin{aligned}\therefore \quad J_{L S}(\theta) &=\frac{1}{2}\|A x-\vec{b}\|^{2} \\&=\frac{1}{2}(A x-b)^{T}(A x-b) \\&=\frac{1}{2}\left(x^{T} A^{T}-b^{T}\right)(A x-b) \\ &=\frac{1}{2}\left(x^{T} A^{T} A x-2 b^{T} A x+b^{T} b\right) \end{aligned}$
需要注意的 $\mathrm{b}, \mathrm{x}$ 都是列向量, 那么 $b^{T} A x$ 是個標量, 標量的轉置等於自身, $b^{T} A x=x^{T} A^{T} b$
對 $\overrightarrow{\boldsymbol{x}}$ 求導得:
$J_{L S}^{\prime}(\theta)=A^{T} A x-A^{T} b=A^{T}(A x-b)$