張量分解與應用-學習筆記[01]

本文轉載自查看原文 2019-12-15 00:21 1181 高維計算/ 張量/ 張量分解

0. 前言

本筆記主要是圍繞這篇學術期刊文章進行的:

Tensor Decomposition and Applications

前期內容為選擇性的翻譯, 重新梳理邏輯, 省略一些沒有意義的部分, 添加自己的例子.

后期將結合其他學術文章進行試驗性探討.

此筆記為筆者接觸該領域的第一步，最適合從未接觸過該領域的朋友。

希望能拋磚引玉，吸引更多的愛好者。

未來將以張量如何切入深度學習及強化學習領域等方面進行研究和探討。希望這個長篇能夠堅持下去。

如果對應定義暫時無法尋找到中文名，將直接采用英文名。忘諒解。（反正寫文獻的時候也肯定是用英文嘛）

1. 介紹

什么是張量（tensor）？簡單地說，就是個多維數組。在本研究范圍內，不考慮任何物理和工學領域內的張量定義，而僅僅考慮其數學領域。正式的說，應該叫張量域（tensor fields）。第一階張量（first-order tensor）是個向量（vector），第二階張量（second-order tensor）是矩陣（matrix），更多階的張量我們稱之為高階張量（higher-order tensor）。
一個簡單的3階張量如圖下所示。注意 $i$ 是在我們平時所認為的 $y$ 軸上，索引也不是從0開始，從1開始。1開始的地方也值得注意。雖然這不影響什么重要的推理，但會影響后面公式中的一些順序和例子的結果。

一個簡單的3階張量圖

在接下來的文章中將啟用以下常規字體設定。小寫加粗字母例如 x 代表向量，大寫加粗字母 $\mathrm{X}$代表矩陣，花體 $\mathcal{X}$ 代表張量。

2. 基本定義

2.1 Fiber 與 Slice

當我們將一個張量沿着他的第k個維展開的時候，我們就獲得了mode-k fiber。例如，三階張量 $\mathcal{X}$ 的mode-1 fiber為：$X_{:jk}$。換句話說，所有維度的index都維持不變，除了第k個維度被展開。
同理，當我們除了2個維度展開以外維持所有index不變時，我們就獲得了slice。例如，三階張量 $\mathcal{X}$ 的horizontal slice為：$X_{i::}$。最為常用的slice為下圖第二行第三個的frontal slice$X_{::k}$。我們常常可以縮寫為$X_k$。請務必牢記。

原文對fiber和slice的配圖

2.2 Norm 范數與 inner product內積

張量norm的定義和矩陣范數的定義類似，均為所有元素的平方之和開根。對於張量 $\mathcal{X} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N}$來說：

\[ ||\mathcal{X}|| = \sqrt{\sum_{i1=1}^{I_1}\sum_{i2=1}^{I_2}\dots\sum_{iN=1}^{I_N}x^2_{i_1i_2\dotsi_N}} \]

類似的，我們定義內積為2個維度相等的張量之間對應位置元素之積的和。這和矩陣定義是類似的。對於張量 $\mathcal{X}, \mathcal{Y} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N}$來說：

\[ \langle\mathcal \:{X}, \:\mathcal{Y} \: \rangle = \sqrt{\sum_{i1=1}^{I_1}\sum_{i2=1}^{I_2}\dots\sum_{iN=1}^{I_N}x^2_{i_1i_2\dotsi_N}} \]

顯然，$\langle\mathcal{X},\:\mathcal{X}\rangle = ||X||^2$

2.3 Rank-One Tensors 秩1張量

即使范數和內積與矩陣類似，我們要對秩（rank）萬分小心。后面我們馬上會提到這是個非常棘手的概念。我們很多時候無法輕易地決定一個張量的秩是多少。不過，秩1張量比較特別。他可以被向量（vector）的外積（outer product）所定義。
對於一個N階張量 $\mathcal{X} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N}$來說，當可以被寫成N個向量的外積時，此張量的秩為1：

\[ \mathcal{X} = a^{(1)} \circ a^{(2)} \circ \dots \circ a^{(N)} \]

$\circ$ 是外積的符號。每個張量的元素都可以被寫成這些向量對應位置元素之積：

\[x_{i_1i_2\dots i_N} = a^{(1)}_{i_1} \circ a^{(2)}_{i_2} \circ \dots \circ a^{(N)}_{i_N} \]

2.4 cubical 立方，Symmetry 對稱與 Supersymmetric 超對稱

當張量的所有維度大小相等時，我們稱之為立方（cubical）。
當立方張量中的任何一個元素的index被置換后（permutation）元素值不變時，我們稱這個張量為超對稱。
例：對於3階張量$\mathcal{X} \in\mathbb{R}^{I \times I \times I}$來說，如果滿足以下等式，則被稱之為超對稱。

\[ x_{ijk} = x_{ikj} = x_{jik} = x_{jki} = x_{kij} = x_{kji} \quad \text{ for all $ i, j, k = 1,\dots,I.$} \]

張量在某些mode下符合對稱的條件，這時候我們只叫他在
對應的mode下對稱。例如，3階張量$\mathcal{X} \in\mathbb{R}^{I \times I \times K}$ 的frontal slice對稱時，我們稱該張量在mode 1和2之下對稱。（這里有些拗口，可以理解為，展開1和2的情況，固定剩余維度的情況下，所獲得的的slice是對稱的）

\[X_k = X_k^\mathsf{T} \quad \text{ for all $ k = 1,\dots,K.$} \]

2.5 Diagonal Tensors 對角張量

如果一個張量$\mathcal{X} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N}$的任何元素只有在$i_1 \: = \: i_2 \: = \: \dots \: = i_N$的時候不為0，也就是$x_{i_1i_2 \dots i_N} \neq 0$時，被稱之對角張量。
如果對角張量同時是立方的，則只有超對角線（superdiagonal）所經過的元素不為0
值得注意的是，對角張量對任何維度比例的張量其實都成立。

2.6 Matricization 矩陣化

矩陣化講述了如何將高維張量拆解成2階的矩陣。這是個極為重要的概念，日后將頻繁出現在各種公式與定理之中。其文字化定義意外的簡單，而數學定義較為繁瑣。
文字化定義：對於張量$\mathcal{X} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N}$來說，我們稱其mode-n的矩陣化為 $X_{(n)}$。通過把$\mathcal{X}$的每一根mode-n fiber按序插入這個矩陣的列中，我們就完成了矩陣化。
數學定義必須定義其順序所以稍顯復雜。我們定義一個從張量元素$(i_1, i_2, \dots , i_N)$到矩陣元素$(i_n, j)$的映射：

\[ j \: = \: 1 + \sum_{k=1 \\ k\neq n} ^N (i_k-1)J_k \quad \text{ with } \quad J_k = \prod_{m=1\\m\neq n}^{j-1} I_m. \]

例：假設某張量$\mathcal{X} \in \mathbb{R}^{3 \times 4 \times 2}$的frontal slices如下：

\[ \begin{equation} X_1 = \begin{bmatrix} 1 & 4 & 7 & 10 \\ 2 & 5 & 8 & 11 \\ 3 & 6 & 9 & 12 \end{bmatrix} \text{，}\quad X_2 = \begin{bmatrix} 13 & 16 & 19 & 22 \\ 14 & 17 & 20 & 23 \\ 15 & 18 & 21 & 24 \end{bmatrix} \end{equation}. \]

運用文字化定義，我們能很快的獲得其3個mode下的矩陣化結果。如果你想挑戰數學定義，你可能需要一個筆記本或者一段代碼.

\[ X_{(1)} = \begin{bmatrix} 1 & 4 & 7 & 10 & 13 & 16 & 19 & 22 \\ 2 & 5 & 8 & 11 & 14 & 17 & 20 & 23 \\ 3 & 6 & 9 & 12 & 15 & 18 & 21 & 24 \end{bmatrix}, \]

\[ X_{(2)} = \begin{bmatrix} 1 & 2 & 3 & 13 & 14 & 15 \\ 4 & 5 & 6 & 16 & 17 & 18 \\ 7 &8 & 9 & 19 & 20 & 21 \\ 10 & 11 & 12 & 22 & 23 & 24 \end{bmatrix}, \]

\[ X_{(3)} = \begin{bmatrix} 1 & 2 & 3 & 4 & 5 & \dots & 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 & 17 & \dots & 21 & 22 & 23 & 24 \end{bmatrix}. \]

注意：不同的論文有時會在展開（unfold）時使用完全不同的排序方法。只要這些排序方法是前后統一的，一般來說不會給理論及計算帶來影響。順便，如果以本文的順序來定義向量化的話。則為以下形式。

\[\mathbb{vec}(\mathcal{X}) = \begin{bmatrix} 1 \\ 2 \\ \vdots \\ 24 \end{bmatrix}. \]

2.7 Tensor Multiplication: The n-Mode Product 張量乘法之n-Mode乘積

定義了上述概念后，我們終於可以定義張量乘法了。就像是矩陣乘法有其特別算法一樣，張量也有類似的算法，只是更為復雜。本文將不會敘述一個完整的張量乘法定義，而是僅挑選其最為有意義的n-mode乘法來進行介紹。也就是張量與矩陣（或向量）在mode n之下的乘積。
張量$\mathcal{X} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N}$與矩陣$\mathrm{U} \in \mathbb{R} ^{J \times I_n}$的mode-n乘積我們寫作$\mathcal{X} \times_n U$
其維度大小為 $I_1 \times \dots \times I_{n-1} \times J \times I_{n+1} \times \dots \times I_N.$。其每個元素可被寫為：

\[(\mathcal{X} \times_n \mathrm{U})_{i_1\dots i_{n-1}ji_{n+1}\dots i_N} = \sum_{i_n = 1 } ^{I_n} x_{i_1i_2\dots i_N} \: u_{ji_n}. \]

筆者是這么理解這個公式的：將沒有選中的維度所組成的索引集為行，選中的維度展開為列，形成的矩陣與U相乘，便是n-mode張量乘法的結果。這個矩陣列就是之前矩陣化時所提到的，每個列都是原始張量的mode-n fiber。例如，對於一個$5 \times 3 \times 2$的張量與$2\times 9$的矩陣相乘，我們可以看做張量被矩陣化（此例為mode-3 因為只有第三個維度才可以和矩陣相乘）為$5 \times 3 = 15$行。這個15也就是除去被選中進行乘法的維度以外的剩余維度可索引元素最大數量。最后該矩陣乘法的結果為一個矩陣。這是我們原本所期待的張量乘積的n-mode矩陣化后的產物。由於多維索引被我們壓縮在一起變成了行，為了還原張量結果，我們需要還回這些索引到本來的位置，於是便獲得了真正的乘積$\mathcal{Y}$。用數學語言來說，便是如下等式：

\[ \mathcal{Y} = \mathcal{X} \times_n \mathrm{U} \quad \Leftrightarrow \quad Y_{(n)} = \mathrm{UX}_{n}. \]

再來一個具體的例子吧。令 $\mathcal{X}$ 為2.6例子中的張量，令 $\mathrm{U} = \begin{bmatrix} 1 & 3 & 5 \\ 2 & 4 & 6 \end{bmatrix}.$ 則他們的乘積$\mathcal{Y} = \mathcal{X} \times_1 \mathrm{U} \in \mathbb{R}^{2 \times 4 \times 2}$為：

\[ \mathrm{Y}_1 = \begin{bmatrix}22 & 49 & 76 & 103 \\ 28 & 64 & 100 & 136 \end{bmatrix} , \quad \mathrm{Y}_2 = \begin{bmatrix} 130 & 157 & 184 & 211 \\ 172 & 208 & 244 & 280 \end{bmatrix}. \]

2.8 Properties of Mode-n Product 矩陣乘法的性質

當張量和多個矩陣進行不同mode的連續相乘時，乘法的順序和結果無關：

\[\mathcal{X} \times_m \mathrm{A} \times_n \mathrm{B} = \mathcal{X} \times_n \mathrm{B} \times_m \mathrm{A} \quad\text{ $(m \neq n)$}. \]

當mode相同時，存在以下特性：

\[ \mathcal{X} \times_n \mathrm{A} \times_n \mathrm{B} = \mathcal{X} \times_n (\mathrm{B} \mathrm{A}). \]

張量$\mathcal{X} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N} $ 與向量 $\mathrm{v} \in \mathbb{R}^{I_n}$的n-mode乘法寫作 $\mathcal{X} \: \bar\times_n \: \mathrm{v}.$
對每個元素來說下式成立。

\[ (\mathcal{X} \: \bar\times_n \: \mathrm{v})_{ i_1 \dots i_{n-1} i_{n+1}\dots i_N } = \sum_{i_n = 1}^{I_n} x_{i_1 i_2 \dots i_N} \: v_{i_n}. \]

因此，和向量的mode-n相乘等同於$\mathcal{X}$的每一根mode-n fiber與向量$\mathrm{v}$的內積。由於內積塌縮了這個維度，使得他的大小只有1，在這里我們選擇泯滅了這個mode-n所對應的維度，使得結果必然為$N-1$階。（在深度學習模型中，有很多時候我們能看到（1，64，64，4）之類的張量。對應了圖片數量，尺寸，顏色等。顯然筆者認為我們也可以定義另一種mode-n乘法使得結果變為$\mathbb{R}^{i_1 \times \dots i_{n-1} \times \:\mathrm{1} \:\times i_{n+1}\times \dots \times i_N}$。只是在本文的討論范疇內，此選擇是沒有必要的。）
令 $\mathcal{X}$ 還是為2.6例子中的張量，設 $\mathrm{v} = \begin{bmatrix} 1 & 2 & 3 & 4 \end{bmatrix}^T$，則

\[\mathcal{X} \: \bar\times_2 \: \mathrm{v} = \begin{bmatrix} 70 & 190 \\ 80 & 200 \\ 90 & 210\end{bmatrix}. \]

由於mode-n向量乘法在定義里面減去了1個維度，這使得張量的連續向量乘法中，乘法的優先順序變得很重要。換句話說，下列等式將成立：

\[\mathcal{X} \: \bar\times_m \: \mathrm{a} \: \bar\times_n \: \mathrm{b} = (\mathcal{X} \: \bar\times_m \: \mathrm{a}) \bar\times_{n-1} \: \mathrm{b} = (\mathcal{X} \: \bar\times_{m} \mathrm{b}) \: \bar\times_m \: \mathrm{a} \quad \text{ for $\:m<n$} . \]

2.9 矩陣的 Kronecker， Khatri-Rao及Hadamard乘積

我們需要了解這些矩陣的特殊乘法因為他對我們未來的討論至關重要.
矩陣$\mathrm{A}\in\mathbb{R}^{I \times J}$ 與矩陣$\mathrm{B}\in\mathbb{R}^{K\times L}$ 之間的Kronecker乘積被寫作$\mathrm{A} \otimes \mathrm{B}$. 其乘積結果為以下公式所定義的一個$(IK)\times (JL)$維度的矩陣:

\[\begin{aligned}\mathrm{A} \otimes \mathrm{B} &= \begin{bmatrix} a_{11}\mathrm{B} & a_{12}\mathrm{B} & \dots & a_{1J}\mathrm{B} \\ a_{21}\mathrm{B} & a_{22}\mathrm{B} & \dots & a_{2J}\mathrm{B} \\ \vdots & \vdots & \ddots & \vdots \\ a_{l1}\mathrm{B} & a_{12}\mathrm{B} & \dots & a{IJ}\mathrm{B} \end{bmatrix} \\ &= \begin{bmatrix} a_1 \otimes b_1 & a_1\otimes b_2 & a_1 \otimes b_3 \dots a_J \otimes b_{L-1} & a_J \otimes b_L \end{bmatrix}.\end{aligned} \]

第二行的結果可能有些隱晦, 但如果我們將第一行的結果完全展開, 只塌縮B的部分, 便可得到這個結果.
讓我們來做個簡單的例子吧. 令$\mathrm{A} = \begin{bmatrix} 1 & 0 & 6 \\ 3 & 4 & 2 \end{bmatrix}, \, \mathrm{B} = \begin{bmatrix} 2 & 1 \\ 1& 4 \end{bmatrix}$. 則:

\[\begin{aligned} \mathrm{A} \otimes \mathrm{B} &= \begin{bmatrix} 1\,B & 0\,B & 6\,B \\ 3\,B & 4\,B & 2\,B \end{bmatrix}\\ &= \begin{bmatrix} 2 & 1 & 0 & 0 & 12 & 6 \\ 1 & 4 & 0 & 0 & 6 & 24 \\ 6 & 2 & 8 & 4 & 4 & 2 \\ 3 & 12 & 4 & 16 & 2 & 8\\ \end{bmatrix}\\ &= \begin{bmatrix} 1\,b_1 & 1\,b_2 & 0\,b_1 & 0\,b_2 & 6\,b_1 & 6\,b_2\\ 3\,b_1 & 3\,b_2 & 4\,b_1 & 4\,b_2 & 2\,b_1 & 2\,b_2\\ \end{bmatrix}\\ &= \begin{bmatrix} a_1 \otimes b_1 & a_1 \otimes b_2 & a_2 \otimes b_1 & a_2 \otimes b_2 & a_3 \otimes b_1 & a_3 \otimes b_2 \end{bmatrix} \end{aligned} \]

Khatri-Rao乘積是一種列對列(columnwise)的Kronecker乘積. 矩陣$\mathrm{A}\in\mathbb{R}^{I \times J}$ 與矩陣$\mathrm{B}\in\mathbb{R}^{K\times L}$ 之間的Khatri-Rao乘積被寫為$\mathrm{A} \odot \mathrm{B}$. 其結果為:

\[\mathrm{A}\odot \mathrm{B} = \begin{bmatrix} a_1 \otimes b_1 & a_2 \otimes b_2 & \dots & a_K \otimes b_K \end{bmatrix}. \]

當我們進行向量間的Khatri-Rao乘積時, 由於不存在第一列以外的列, 其結果將與Kronecker乘積相同.
Hadamard乘積是一種元素對元素(elementwise)的乘積. 對矩陣 $A, B \in \mathbb{R}^{I \times J}$ 來說, 其Hadamard乘積寫作 $A * B$ , 結果如下:

\[\mathrm{A} * \mathrm{B} = \begin{bmatrix} a_{11}b_{11} & a_{12}b_{12} & \dots & a_{1J}b_{1J}\\a_{21}b_{21} & a_{22}b_{22} & \dots & a_{2J}b_{2J}\\\vdots & \vdots & \ddots & \vdots \\a_{I1}b_{I1} & a_{I2}b_{I2} & \dots & a_{IJ}b_{IJ}\end{bmatrix}. \]

以上乘積還有如下的性質:

\[\begin{aligned}(\mathrm{A} \otimes \mathrm{B}) (\mathrm{C} \otimes \mathrm{D}) &= \mathrm{A}\mathrm{C} \otimes \mathrm{B}\mathrm{D},\\(\mathrm{A} \otimes \mathrm{B})^\dagger &= \mathrm{A}^\dagger \otimes \mathrm{B}^\dagger,\\\mathrm{A} \odot \mathrm{B} \odot \mathrm{C} &= (\mathrm{A} \odot \mathrm{B}) \odot \mathrm{C} = \mathrm{A} \odot (\mathrm{B} \odot \mathrm{C}),\\(\mathrm{A} \odot \mathrm{B})^\mathsf{T}(\mathrm{A} \odot \mathrm{B}) &= \mathrm{A}^\mathsf{T} \mathrm{A} * \mathrm{B}^\mathsf{T} \mathrm{B},\\(\mathrm{A} \odot \mathrm{B})^\dagger &= ((\mathrm{A}^\mathsf{T} \mathrm{A})*(\mathrm{B}^\mathsf{T} \mathrm{B}))^\dagger(\mathrm{A} \odot \mathrm{B})^\mathsf{T}.\end{aligned} \]

上式證明請參照此論文
其中 $A^\dagger$ 代表了A的Moore-Penrose偽逆矩陣. 如果你對這個概念不甚熟悉了, 可以參照這個UCLA的數學課件.
應用Kronecker乘積的一個比較有意義的例子: 假設張量$\mathcal{X} \in\mathbb{R}^{I_1 \times I_2 \times \dots \times I_N}$ 和 $A^{(n)}\in \mathbb{R}^{J_n \times I_n} , \text{for all $n\in{1,\dots,N}$}$ 那么, 對任何的$n \in {1,\dots,N}.$, 我們都有

\[\mathcal{Y} = \mathcal{X} \times_1 \mathrm{A}^{(1)} \times_2 \mathrm{A}^{(2)}\dots \times_N \mathrm{A}^{(N)} \quad \\ \Updownarrow \\\mathrm{Y}_{(n)} = \mathrm{A}^{(n)}\mathrm{X}_{(n)}\Big(\mathrm{A}^{(N)}\otimes \dots \otimes \mathrm{A}^{(n+1)} \otimes \mathrm{A}^{(n-1)} \otimes \dots \otimes \mathrm{A}^{(1)}\Big)^\mathsf{T}. \]

下一期將介紹張量秩與常見分解等要素, 敬請期待!

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 張量分解與應用-學習筆記[02] 張量分解與應用-學習筆記[03] 張量網絡學習筆記[1] 【pytorch】學習筆記（一）-張量張量Tucker分解 SaToken學習筆記-01 BGFX學習筆記01 Hbase學習筆記01 OneThink學習筆記01 [機器學習筆記]奇異值分解SVD簡介及其在推薦系統中的簡單應用