GAN
論文鏈接 Generative Adversarial Nets
問題:數據x分布為 \(P_{data}(x)\),有樣本{\({x_1,x_2,...,x_m}\)}。現在我們有生成器 \(G\) ,希望生成器 \(G\)生成這些樣本的概率最大。似然是
\(L = \sum_{i=1}^{m}P_G{(x_i;\theta)}\), \(\theta\)為G的參數。
極大似然估計:\(\theta^* = arg\ \underset{\theta}{max}\prod_{i=1}^{m}P_G({x_i;\theta})\)

第四行假設樣本獨立同分布,\(m\)越大越好。
第五行加了一個與\(\theta\)無關的項,相當於\(D(p||q) = H(p,q) - H(p)\),其實沒必要。
這樣當\(P_{data}(x)=P_G (x)\) 時,似然最大。
接下來GAN登場,



固定\(G\),求得最好的\(D\)


所以當 \(P_{data}=P_{G}\)時,\(G\)最優,原來的極大似然估計,轉變成了GAN
上面都是理論,明白就行。
具體算法:

但是 \(log(1-D(x))\)在\(D(x)=0\)處太平滑,\(D(x)\)接近1時反而太大,接近剛開始\(G\)很弱,\(D(G(z))\)很小,用
\(-log(D(x))\)代替正好合適。


很直觀的描述訓練過程
#Pytorch 實現loss
adversarial_loss=torch.nn.BCELoss()
g_loss = adversarial_loss(discriminator(gen_imgs), valid) # valid全1序列,fake全0序列
real_loss = adversarial_loss(discriminator(real_imgs), valid)
fake_loss = adversarial_loss(discriminator(gen_imgs.detach()), fake) #detach()在這里不管
d_loss = (real_loss + fake_loss) / 2
BCELoss:

ACGAN
能夠控制生成類別,

G第一層:self.label_emb = nn.Embedding(opt.n_classes, opt.latent_dim)
使用embedding的方法。

\(D\) is trained to maximize \(L_S + L_C\) while \(G\) is trained to maximize \(L_C − L_S\). AC-GANs learn a representation for \(z\) that is independent of class label
validity, pred_label = discriminator(gen_imgs)
g_loss = 0.5 * (adversarial_loss(validity, valid) + auxiliary_loss(pred_label, gen_labels))
real_pred, real_aux = discriminator(real_imgs)
d_real_loss=(adversarial_loss(real_pred, valid) + auxiliary_loss(real_aux, labels))/2
# Loss for fake images
fake_pred, fake_aux = discriminator(gen_imgs.detach())
d_fake_loss=(adversarial_loss(fake_pred, fake) + auxiliary_loss(fake_aux, gen_labels))/2
# Total discriminator loss
d_loss = (d_real_loss + d_fake_loss) / 2
CrossEntropyLoss:
AAE

encoded_imgs = encoder(real_imgs)
decoded_imgs = decoder(encoded_imgs)
# Loss measures generator's ability to fool the discriminator
g_loss = 0.001 * adversarial_loss(discriminator(encoded_imgs), valid) + 0.999 * pixelwise_loss(decoded_imgs, real_imgs)
z = Variable(Tensor(np.random.normal(0, 1, (imgs.shape[0], opt.latent_dim))))
# Measure discriminator's ability to classify real from generated samples
real_loss = adversarial_loss(discriminator(z), valid)
fake_loss = adversarial_loss(discriminator(encoded_imgs.detach()), fake)
d_loss = 0.5 * (real_loss + fake_loss)
BiGAN

BGAN
def boundary_seeking_loss(y_pred, y_true):
"""
Boundary seeking loss.
Reference: https://wiseodd.github.io/techblog/2017/03/07/boundary-seeking-gan/
"""
return 0.5 * torch.mean((torch.log(y_pred) - torch.log(1 - y_pred)) ** 2)
g_loss = boundary_seeking_loss(discriminator(gen_imgs), valid)
適用於離散數據
BEGAN
BEGAN: Boundary Equilibrium Generative Adversarial Networks
兩個貢獻:
1.使用autoencoder作為D
\(L : R^{N_x} \to R^+\) the loss for training a pixel-wise autoencoder as:

使用Wasserstein loss來衡量real_loss和fake_loss分布之間的差距



選擇上面的b因為,一個好的D是對real友好的。
g_loss = torch.mean(torch.abs(discriminator(gen_imgs) - gen_imgs))
2.使用了 Equilibrium
收斂的時候我們希望\(E(L(x))=E(L(G(z)))\),但是我們可以relax一下這個條件
\(\gamma = \frac{E(L(G(z))}{E(L(x))}\) ,\(\gamma\in[0,1]\)
(因為G不強,所以autoencoder可以輕松模擬G生成的圖像,\(L(G(z))\)很小)

用了Proportional Control Theory使得\(E [L(G(z))] = γE [L(x)]\)
\(M_{global} = L(x) + |γL(x) − L(G(zG))|\)用來衡量是否收斂
d_real = discriminator(real_imgs)
d_fake = discriminator(gen_imgs.detach())
d_loss_real = torch.mean(torch.abs(d_real - real_imgs))
d_loss_fake = torch.mean(torch.abs(d_fake - gen_imgs.detach()))
d_loss = d_loss_real - k * d_loss_fake
diff = torch.mean(gamma * d_loss_real - d_loss_fake)
# Update weight term for fake samples
k = k + lambda_k * diff.item()
k = min(max(k, 0), 1) # Constraint to interval [0, 1]
# Update convergence metric
M = (d_loss_real + torch.abs(diff)).item()
BicycleGAN

訓練完成后,使用G,給定A,微調z就可以生成不同的圖像。
沒什么重點,訓練的時候注意一下更新參數的順序就行了
ClusterGAN

利用離散連續混合采樣,平衡聚類和插值。
損失函數\(q(x)\) 是可以使\(q(x)=log(x)\)或者是\(q(x)=x\)(WGAN)

CGAN


D需要y的輸入,因為D需要知道這個條件,否則G可以生成隨便的高質量的圖片
$\underset{G}{Min} \underset{D}{Min}V(D, G) = E_{x∼ p_{data}(x)}[log D(x|y)] + E_{z∼p_z(z)}[log(1 − D(G(z|y)))]. $
G loss
z = Variable(FloatTensor(np.random.normal(0, 1, (batch_size, opt.latent_dim))))
gen_labels = Variable(LongTensor(np.random.randint(0, opt.n_classes, batch_size)))
# Generate a batch of images
gen_imgs = generator(z, gen_labels)
# Loss measures generator's ability to fool the discriminator
validity = discriminator(gen_imgs, gen_labels)
g_loss = adversarial_loss(validity, valid)
D loss
validity_real = discriminator(real_imgs, labels)
d_real_loss = adversarial_loss(validity_real, valid)
# Loss for fake images
validity_fake = discriminator(gen_imgs.detach(), gen_labels)
d_fake_loss = adversarial_loss(validity_fake, fake)
# Total discriminator loss
d_loss = (d_real_loss + d_fake_loss) / 2
CCGAN
SEMI-SUPERVISED LEARNING WITH CONTEXT-CONDITIONAL GENERATIVE ADVERSARIAL NETWORKS


第二個版本,更加重視對fake的判斷

第三個版本

半監督學習的思想:
將D看做是一個分類器,(x,y)帶標簽的數據正常分類,假的生成的fake看作是第k+1類,真的image無標簽,判斷其不是第k+1類的概率
Context Encoders




跟上面一篇基本一樣,沒什么可說的
CoGAN

通過參數共享,通過兩個邊緣分布,就可以學習到兩個分布的聯合分布
另一種形式,也可以用下面的形式,或者,加參數共享,cycle,等保證中間的latent-space分布一致

CycleGAN

# Set model input
real_A = Variable(batch["A"].type(Tensor))
real_B = Variable(batch["B"].type(Tensor))
# Adversarial ground truths
valid = Variable(Tensor(np.ones((real_A.size(0), *D_A.output_shape))), requires_grad=False)
fake = Variable(Tensor(np.zeros((real_A.size(0), *D_A.output_shape))), requires_grad=False)
# ------------------
# Train Generators
# ------------------
G_AB.train()
G_BA.train()
optimizer_G.zero_grad()
# Identity loss
loss_id_A = criterion_identity(G_BA(real_A), real_A)
loss_id_B = criterion_identity(G_AB(real_B), real_B)
discriminator_loss
loss_identity = (loss_id_A + loss_id_B) / 2
# GAN loss
fake_B = G_AB(real_A)
loss_GAN_AB = criterion_GAN(D_B(fake_B), valid)
fake_A = G_BA(real_B)
loss_GAN_BA = criterion_GAN(D_A(fake_A), valid)
loss_GAN = (loss_GAN_AB + loss_GAN_BA) / 2
# Cycle loss
recov_A = G_BA(fake_B)
loss_cycle_A = criterion_cycle(recov_A, real_A)
recov_B = G_AB(fake_A)
loss_cycle_B = criterion_cycle(recov_B, real_B)
loss_cycle = (loss_cycle_A + loss_cycle_B) / 2
# Total loss
loss_G = loss_GAN + opt.lambda_cyc * loss_cycle + opt.lambda_id * loss_identity
loss_G.backward()
optimizer_G.step()
G的loss分三種,傳統的foolD的loss,重建的loss,encoder的loss,保證域變換
DCGAN

DiscoGAN
跟cyclegan一個東西就是用了兩個D

DualGAN
跟DiscoGAN一模一樣
EBGAN
ENERGY-BASED GENERATIVE ADVERSARIAL NETWORKS


\([·]^+= max(0, ·).\)
當G足夠好的時候,就不使用D(G(z))產生的loss了
\(EBGAN-PT\),\(L_G(z)=D_{img}(G(z))+pullaway(D_{embedding}(G(z)))\)

keeping the model from producing samples that are clustered in one or only few modes of pdata.
ESRGAN
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks




生成器loss的三部分,知覺損失,perceptual loss使用vgg前35層產生的隱藏變量(features before the activation layers 信息更多),衡量fake和real之間的差距。第二個是fool G的loss,第三個是fake和real之間的l1loss.
InfoGAN

類似acgan
最大化互信息c,和x使得可以通過控制c來控制生成的圖片


使用輔助分布,\(H(c)\)可以看做是常數

LSGAN
損失函數把交叉熵損失函數改為了最小二乘loss.好處讓G盡可能靠近decision boundary
BGAN只對G使用了最小二乘loss,LSGAN對G和D都用了。
MUNIT

好像就是cyclegan的擴展,多個model多個G,D......
pixel2pixel


PixelDA


RGAN
The relativistic discriminator: a key element missing from standard GAN


Semi-SurpervisedGAN
Semi-Supervised Learning with Generative Adversarial Networks
D是一個分類器,fake是N+1類數據

Softmax GAN

StarGAN


RSGAN


UNIT
跟cogan一樣

VAEGAN


Wasserstein GAN
GAN的問題





-
判別器越好,生成器梯度消失越嚴重
-
定理:\(p_{data}\)與\(p_g\)的支撐集是高維空間的低維流形(manifold)時,\(p_{data}\)與\(p_g\)重疊部分的測度(measure)為0的概率為1
- 支撐集: 函數非零子集,概率分布的支撐集指所有概率密度非零部分的集合
- 流形: 高維空間中曲線、曲面概念的拓展,如三維空間曲面是二維流形,因為他的本質維度只有2;同理三維空間或二維空間的曲線是一個一維流形
- 測度:超體積
-
改變的第二種loss不合理
-
由公式7可以看出來,kl,和js都是衡量分布距離,一正一負,糾結
-
KL散度是非對稱的,\(p_g\rightarrow 0,p_{data}\rightarrow 1\)時,\(KL(p_g||p_{data})\rightarrow 0\),反過來卻是\(KL(p_g||p_{data}) \rightarrow \infty\),直觀上理解就是,當生成錯誤樣本時,懲罰是巨大的;但是沒生成真實樣本的懲罰卻很小,這樣會導致GAN會產生一些重復且懲罰低的樣本,而不會產生多樣性的樣本,導致懲罰很高
-
-
用wassertein距離代替kl divergence.




挖了一個L約束的坑,論文中的weight-clip的方法,就是湊上去的。
Wassersterin GAN GP
使用了GP的方法


不能在所有樣本空間采樣計算D(x)的梯度,就用了一個真實樣本到采樣樣本之間插值這樣一個區間來采樣,進行約束,約束讓梯度越接近1越好,實現結果表示很好,缺乏理論支撐。同樣存才問題
Wasserseterin GAN DIV

