Jae Hyun Lim, Jong Chul Ye, Geometric GAN.
概
很有趣, GAN的訓練過程可以分成
- 尋找一個超平面區分real和fake;
- 訓練判別器, 使得real和fake分得更開;
- 訓練生成器, 使得real趨向錯分一側.
主要內容

McGAN
本文啟發自McGAN, 在此基礎上, 有了下文.
結合SVM
設想, GAN的判別器\(D(x) = S(\langle w, \Phi_{\zeta}(x) \rangle)\), 其中\(S\)是一個激活函數, 常見如sigmoid, 先假設其為identity(即\(D(x)=\langle w, \Phi_{\zeta}(x) \rangle\)).
McGAN 是借助\(\langle w, \Phi_{\zeta}(x)\rangle\)來構建IPM, 並通過此來訓練GAN. 但是,注意到, 若將\(\Phi_{\zeta}(x)\)視作從\(x\)中提取出來的特征, 則\(\langle w, \Phi_{\zeta}(x)\rangle\)便是利用線性分類器進行分類,那么很自然地可以將SVM引入其中(訓練判別器的過程.
\[\begin{array}{rcl} \min_{w, b} & \frac{1}{2} \|w\|^2 + C \sum_i (\xi_i + \xi_i') & \\ \mathrm{subject \: to} & \langle w, \Phi_{\zeta}(x_i) \rangle + b \ge 1-\xi_i & i=1,\ldots, n\\ & \langle w, \Phi_{\zeta}(g_{\theta}(z_i)) \rangle + b \le \xi_i'-1 & i=1,\ldots,n \\ & \xi_i, \xi_i' \ge 0, \: i=1,\ldots,n. \end{array} \]
類似於
\[\tag{13} \min_{w,b} \: R_{\theta}(w,b;\zeta), \]
其中
\[\tag{14} \begin{array}{ll} R_{\theta}(w,b;\zeta) = & \frac{1}{2C n} \|w\|^2 + \frac{1}{n} \sum_{i=1}^n \max (0, 1-\langle w, \Phi_{\zeta} (x_i) \rangle -b) \\ & + \frac{1}{n} \sum_{i=1}^n \max (0, 1+ \langle w, \Phi_{\zeta}(g_{\theta}(z_i))\rangle+b). \end{array} \]
進一步地, 用以訓練\(\zeta\):
\[\tag{15} \min_{w,b,\zeta} \: R_{\theta}(w,b;\zeta). \]
SVM關於\(w\)有如下最優解
\[w^{SVM} := \sum_{i=1}^n \alpha_i \Phi_{\zeta}(x_i) - \sum_{i=1}^n \beta_i \Phi_{\zeta} (g_{\theta}(z_i)), \]
其中\(\alpha_i, \beta_i\)只有對支持向量非零.
定義
\[\mathcal{M} = \{\phi \in \Xi | |\langle w^{SVM}, \phi \rangle + b | \le 1\} \]
為margin上及其內部區域的點.

於是
\[\tag{18} \begin{array}{ll} R_{\theta}(w,b;\zeta) = \frac{1}{n} \sum_{i=1}^n \langle w^{SVM}, s_i \Phi_{\zeta} (g_{\theta}(z_i))-t_i \Phi_{\zeta}(x_i) \rangle + \mathrm{constant}, \end{array} \]
其中
\[\tag{19} t_i = \left \{ \begin{array}{ll} 1, & \Phi_{\zeta}(x_i) \in \mathcal{M} \\ 0, & \mathrm{otherwise} \end{array} \right. , \quad s_i = \left \{ \begin{array}{ll} 1, & \Phi_{\zeta}(g_{\theta}(z_i)) \in \mathcal{M}\\ 0, & \mathrm{otherwise}. \end{array} \right. \]
訓練\(\zeta\)
於是\(\zeta\)由此來訓練
\[\zeta \leftarrow \zeta +\eta \frac{1}{n} \sum_{i=1}^n \langle w^{SVM}, t_i \nabla_{\zeta} \Phi_{\zeta}(x_i) - s_i \nabla_{\zeta}\Phi_{\zeta} (g_{\theta}(z_i)) \rangle . \]
訓練\(g_{\theta}\)
就是固定\(w,b,\zeta\)訓練\(\theta\).
所以
\[\min_{\theta} \: L_{w, b, \zeta}(\theta), \]
其中
\[L_{w,b,\zeta}(\theta)= -\frac{1}{n} \sum_{i=1}^n D(g_{\theta}(z_i)), \]
的
\[\theta \leftarrow \theta+\eta \frac{1}{n} \sum_{i=1}^n \langle w^{SVM}, s_i \nabla_{\theta}\Phi_{\zeta} (g_{\theta}(z_i)) \rangle . \]
理論分析
\(n \rightarrow \infty\)的時候



定理1: 假設\((D^*,g^*)\)是(24), (25)交替最小化解, 則\(p_{g^*}(x)=p_x(x)\)幾乎處處成立, 此時\(R(D^*,G^*)=2\).
注: 假體最小化是指在固定\(g^*\)下, \(R(D^*,g^*)\)最小,在固定\(D^*\)下\(L(D^*,g^*)\)最小.
證明


注:文中附錄分析了各種GAN的超平面分割解釋, 挺有意思的.