關於“Unsupervised Deep Embedding for Clustering Analysis”的優化問題
作者:凱魯嘎吉 - 博客園 http://www.cnblogs.com/kailugaji/
Deep Embedding Clustering (DEC)和Improved Ceep Emdedding Clustering (IDEC)被相繼提出,但關於參數的優化問題,作者並未詳細給出,於是乎自己推導了一遍,但是發現關於聚類中心的偏導和這兩篇文章的推導結果不一致,不知道問題出在哪?下面,相當於給出一道數學題,來求解目標函數關於某個參數的偏導問題。
問題描述
已知
\[L=\sum\limits_{i}^{N}{\sum\limits_{j}^{c}{{{p}_{ij}}\log \frac{{{p}_{ij}}}{{{q}_{ij}}}}}\]
\[{{q}_{ij}}=\frac{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}{\sum\nolimits_{j}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}}\]
\[{{p}_{ij}}=\frac{q_{ij}^{2}/\sum\nolimits_{i}{{{q}_{ij}}}}{\sum\nolimits_{j}{(q_{ij}^{2}/\sum\nolimits_{i}{{{q}_{ij}}})}}\]
固定${p}_{ij}$, 求$\frac{\partial L}{\partial {{z}_{i}}}$, $\frac{\partial L}{\partial {{\mu }_{j}}}$
問題求解
1. 先求$\frac{\partial L}{\partial {{z}_{i}}}$
根據鏈式法則
\[\frac{\partial L}{\partial {{z}_{i}}}=\sum\limits_{j}^{c}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{z}_{i}}}}\]
\[\frac{\partial L}{\partial {{q}_{ij}}}=\frac{\partial \left( {{p}_{ij}}\log \frac{{{p}_{ij}}}{{{q}_{ij}}} \right)}{\partial {{q}_{ij}}}=\frac{\partial \left( {{p}_{ij}}\log {{p}_{ij}}-{{p}_{ij}}\log {{q}_{ij}} \right)}{\partial {{q}_{ij}}}=-\frac{{{p}_{ij}}}{{{q}_{ij}}}\]
\[ \frac{\partial {{q}_{ij}}}{\partial {{z}_{i}}}=\frac{-2{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}-{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}\cdot (-2)\cdot \sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{{{\left( \sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}} \right)}^{2}}} \\ =-2{{q}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}+2{{q}_{ij}}\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}} \]
其中用到${{\left( \frac{A}{B} \right)}^{\prime }}=\frac{{{A}^{\prime }}B-A{{B}^{\prime }}}{{{B}^{2}}}$,以及上下同乘以$q_{ij}$.
因此,
\[\frac{\partial L}{\partial {{z}_{i}}}=\sum\limits_{j}^{c}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{z}_{i}}}}\\ =\sum\limits_{j}^{c}{-\frac{{{p}_{ij}}}{{{q}_{ij}}}\left( -2{{q}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}+2{{q}_{ij}}\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}} \right)}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-\sum\limits_{j}^{c}{2{{p}_{ij}}\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{l}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{j}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}}{\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{j}^{c}{{{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}}{{{q}_{ij}}\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\frac{\sum\limits_{j}^{c}{{{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}\\ =\sum\limits_{j}^{c}{2{{p}_{ij}}({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}}-2\sum\limits_{j}^{c}{{{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}({{z}_{i}}-{{\mu }_{j}})}\\ =\sum\limits_{j}^{c}{2({{p}_{ij}}-{{q}_{ij}})({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}} \]
其中用到$\sum\limits_{j}^{c}{{{p}_{ij}}}=1$.
2. 再求$\frac{\partial L}{\partial {{\mu }_{j}}}$
根據鏈式法則
\[\frac{\partial L}{\partial {{\mu }_{j}}}=\sum\limits_{i}^{N}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{\mu }_{j}}}}\]
\[\frac{\partial {{q}_{ij}}}{\partial {{\mu }_{j}}}=\frac{2{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})\sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}}-{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}\cdot 2{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-2}}({{z}_{i}}-{{\mu }_{j}})}{{{\left( \sum\limits_{l}^{c}{{{(1+{{\left\| {{z}_{i}}-{{\mu }_{l}} \right\|}^{2}})}^{-1}}} \right)}^{2}}}\\ =2({{z}_{i}}-{{\mu }_{j}}){{q}_{ij}}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}-2q_{ij}^{2}{{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}}({{z}_{i}}-{{\mu }_{j}})\\ =2{{q}_{ij}}(1-{{q}_{ij}})({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}} \]
因此,
\[\frac{\partial L}{\partial {{\mu }_{j}}}=\sum\limits_{i}^{N}{\frac{\partial L}{\partial {{q}_{ij}}}\frac{\partial {{q}_{ij}}}{\partial {{\mu }_{j}}}}=\sum\limits_{i}^{N}{\left( -\frac{{{p}_{ij}}}{{{q}_{ij}}}\cdot 2{{q}_{ij}}(1-{{q}_{ij}})({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}} \right)}=\sum\limits_{i}^{N}{\left( 2{{p}_{ij}}({{q}_{ij}}-1)({{z}_{i}}-{{\mu }_{j}}){{(1+{{\left\| {{z}_{i}}-{{\mu }_{j}} \right\|}^{2}})}^{-1}} \right)}\]
原文結果
不知道問題出在哪?雖然這些推導結果並不影響最終的實驗結果,畢竟直接調用函數就可以出來,不需要親自動手推,但是我覺得原文給出的這個結果可能不對,求廣大網友指正~
參考文獻
[1] Deep Clustering Algorithms - 凱魯嘎吉 博客園
[2] Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis[C]//International conference on machine learning. 2016: 478-487.
[3] Guo X, Gao L, Liu X, et al. Improved deep embedded clustering with local structure preservation[C]//IJCAI. 2017: 1753-1759.