【論文閱讀筆記】《Conditional Generative Adversarial Nets》

本文轉載自查看原文 2020-08-11 17:05 513 文獻閱讀/ 計算機視覺/ 深度學習

論文：《Conditional Generative Adversarial Nets》

年份：2014年

引言

原始的GAN過於自由，訓練會很容易失去方向，導致不穩定且效果差。比如說GAN生成MNIST數字的過程，雖然可以生成數字，但生成的結果是隨機的（因為是根據輸入的隨機噪聲生成的圖片），沒有辦法控制模型生成的具體數字。

CGAN就是在原來的GAN模型中加入一些先驗條件，使得GAN變得更加可控制。具體來說，我們可以在生成模型G和判別模型D中同時加入條件約束y來引導數據的生成過程。條件可以是任何補充的信息，如類標簽等，這樣我們在生成新的樣本的同時，還能確切地控制新樣本的類型。

cGAN結構

cGAN的全程是Conditional Generative Adversarial Networks，即條件對抗生成網絡。它為生成器、判別器都額外加入了一個條件y，這個條件實際上是希望生成的標簽。

生成器G必須要生成和條件y匹配的樣本，判別器不僅要判別圖像是否真實，還要判別圖像和條件y是否匹配。cGAN的輸入輸出為：

生成器G：輸入一個噪聲z，一個條件y，輸出符合該條件的圖像G。
判別器D：輸入一張圖像x，一個條件y，輸出該圖像在該條件下的真實概率D(x|y)

優化目標

在原始的GAN中，優化目標為：

在cGAN中，在其中加入條件y，則優化目標修改成了：

以MNIST為例，生成器G和判別器D的輸入輸出是：

G輸入一個噪聲z，一個數字標簽y(y的取值范圍是0~9)。輸出和數字標簽相符合的圖像G(z|y)。
D輸入一個圖像x，一個數字標簽y。輸出圖像和數字符合的概率D(x|y)。

顯然，在訓練完成后，向G輸入某個數字標簽和噪聲，可以生成對應數字的圖像。

Pytorch代碼實現

cGAN生成器

定義生成器及前向傳播函數：

class Generator(nn.Module):
  def __init__(self):
    super().__init__()
    self.label_emb = nn.Embedding(10, 10)
    self.model = nn.Sequential(
      nn.Linear(110, 256),
      nn.LeakyReLU(0.2, inplace=True),
      nn.Linear(256, 512),
      nn.LeakyReLU(0.2, inplace=True),
      nn.Linear(512, 1024),
      nn.LeakyReLU(0.2, inplace=True),
      nn.Linear(1024, 784),
      nn.Tanh()
    )
  def forward(self, z, labels):
    z = z.view(z.size(0), 100)
    c = self.label_emb(labels)
    x = torch.cat([z, c], 1)
    out = self.model(x)
    return out.view(x.size(0), 28, 28)

其中，torch.nn.Embedding的函數介紹如下：

nn.Embedding(num_embeddings, embedding_dim)
"""
params:
- num_embeddings - 詞嵌入字典大小，即一個字典里要有多少個詞。
- embedding_dim - 每個詞嵌入向量的大小。
"""

cGAN判別器

定義判別器及前向傳播函數：

class Discriminator(nn.Module):
  def __init__(self):
    super().__init__()
    self.label_emb = nn.Embedding(10, 10)
    self.model = nn.Sequential(
      nn.Linear(794, 1024),
      nn.LeakyReLU(0.2, inplace=True),
      nn.Dropout(0.4),
      nn.Linear(1024, 512),
      nn.LeakyReLU(0.2, inplace=True),
      nn.Dropout(0.4),
      nn.Linear(512, 256),
      nn.LeakyReLU(0.2, inplace=True),
      nn.Dropout(0.4),
      nn.Linear(256, 1),
      nn.Sigmoid()
    )
    def forward(self, x, labels):
      x = x.view(x.size(0), 784)
      c = self.label_emb(labels)
      x = torch.cat([x, c], 1)
      out = self.model(x)
      return out.squeeze()

cGAN損失函數

定義判別器對真、假圖像的損失函數：

# 定義判別器對真圖像的損失函數：
real_validity = D(images, labels)
d_loss_real = criterion(real_validity, real_labels)

# 定義判別器對假圖像（即由潛在空間點生成的圖像）的損失函數
z = torch.randn(batch_size, 100).to(device)
fake_labels = torch.randint(0,10,(batch_size,)).to(device)
fake_images = G(z, fake_labels)
fake_validity = D(fake_images, fake_labels)
d_loss_fake = criterion(fake_validity, torch.zeros(batch_size).to(device))

#CGAN總的損失值
d_loss = d_loss_real + d_loss_fake

cGAN可視化

利用網格（10×10）的形式顯示指定條件下生成的圖像

from torchvision.utils import make_grid
z = torch.randn(100, 100).to(device)
labels = torch.LongTensor([i for i in range(10) for _ in range(10)]).to(device)
images = G(z, labels).unsqueeze(1)
grid = make_grid(images, nrow=10, normalize=True)
fig, ax = plt.subplots(figsize=(10,10))
ax.imshow(grid.permute(1, 2, 0).detach().cpu().numpy(), cmap='binary')
ax.axis('off')

查看指定標簽數據

可視化指定單個數字條件下生成的數字：

def generate_digit(generator, digit):
  z = torch.randn(1, 100).to(device)     
  label = torch.LongTensor([digit]).to(device)     
  img = generator(z, label).detach().cpu()     
  img = 0.5 * img + 0.5     
  return transforms.ToPILImage()(img)

# 調用
generate_digit(G, 8)

可視化損失值

記錄判別器和生成器的損失變化：

writer.add_scalars('scalars', {'g_loss': g_loss, 'd_loss': d_loss}, step)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文筆記之：Generative Adversarial Nets Generative Adversarial Nets[EBGAN] Generative Adversarial Nets[LSGAN] Generative Adversarial Nets[Vanilla] Generative Adversarial Nets[content] Generative Adversarial Nets[Improved GAN] GAN（Generative Adversarial Nets）的發展 Generative Adversarial Nets[pix2pix] 論文筆記之：Generative Adversarial Text to Image Synthesis 論文筆記之：Semi-Supervised Learning with Generative Adversarial Networks