GAN tensorflow 實作


從2014年Ian Goodfellow提出GANs(Generative adversarial networks)以來,GANs可以說是目前深度學習領域最為熱門的研究內容之一,這種可以人工生成數據的方法給我們帶來了豐富的想象。有研究者已經能夠自動生成相當真實的卧室、專輯封面、人臉等圖像,並且在此基礎上做了一些有趣的事情。當然那些工作可能會相當困難,下面我們來實現一個簡單的例子,建立一個能夠生成手寫數字的GAN。

GAN architecture

首先回顧一下GAN的結構
Generative adversarial networks包含了兩個部分,一個是生成器generator ,一個是判別器discriminator 。discriminator能夠評估給定一個圖像和真實圖像的相似程度,或者說有多大可能性是人工生成的圖像。discriminator 實質上相當於一個二分類器,在我們的例子中它是一個CNN。generator能根據隨機輸入的值來得到一個圖像,在我們的例子中的generator是deconvolutional neural network。在整個訓練迭代過程中,生成器和判別器網絡的weights和biases的值依然會根據誤差反向傳播理論來訓練得到。discriminator需要學習如何分辨real images和generator制造的fake images。同時generator會根據discriminator的反饋結果去學習如何生成更加真實的圖像以至於discriminator不能分辨。

Loading MNIST data

首先導入tensorflow等需要用到的函數庫,TensorFlow中提取了能夠非常方便地導入MNIST數據的read_data_sets函數。

import tensorflow as tf
import numpy as np
import datetime
import matplotlib.pyplot as plt
%matplotlib inline

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/")

MNIST中每個圖像的初始格式是一個784維的向量。可以使用reshape還原成28x28的圖像。

sample_image = mnist.train.next_batch(1)[0]
print(sample_image.shape)

sample_image = sample_image.reshape([28, 28])
plt.imshow(sample_image, cmap='Greys')

Discriminator network

判別器網絡實際上和CNN相似,包含兩個卷積層和兩個全連接層。

def discriminator(images, reuse_variables=None):
    with tf.variable_scope(tf.get_variable_scope(), reuse=reuse_variables) as scope:
        # 第一個卷積層
        # 使用32個5 x 5卷積模板
        d_w1 = tf.get_variable('d_w1', [5, 5, 1, 32], initializer=tf.truncated_normal_initializer(stddev=0.02))
        d_b1 = tf.get_variable('d_b1', [32], initializer=tf.constant_initializer(0))
        d1 = tf.nn.conv2d(input=images, filter=d_w1, strides=[1, 1, 1, 1], padding='SAME')
        d1 = d1 + d_b1
        d1 = tf.nn.relu(d1)
        d1 = tf.nn.avg_pool(d1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        # 第二個卷積層
        # 使用64個5 x 5卷積模板,每個模板包含32個通道
        d_w2 = tf.get_variable('d_w2', [5, 5, 32, 64], initializer=tf.truncated_normal_initializer(stddev=0.02))
        d_b2 = tf.get_variable('d_b2', [64], initializer=tf.constant_initializer(0))
        d2 = tf.nn.conv2d(input=d1, filter=d_w2, strides=[1, 1, 1, 1], padding='SAME')
        d2 = d2 + d_b2
        d2 = tf.nn.relu(d2)
        d2 = tf.nn.avg_pool(d2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        # 第一個全連接層
        d_w3 = tf.get_variable('d_w3', [7 * 7 * 64, 1024], initializer=tf.truncated_normal_initializer(stddev=0.02))
        d_b3 = tf.get_variable('d_b3', [1024], initializer=tf.constant_initializer(0))
        d3 = tf.reshape(d2, [-1, 7 * 7 * 64])
        d3 = tf.matmul(d3, d_w3)
        d3 = d3 + d_b3
        d3 = tf.nn.relu(d3)

        # 第二個全連接層
        d_w4 = tf.get_variable('d_w4', [1024, 1], initializer=tf.truncated_normal_initializer(stddev=0.02))
        d_b4 = tf.get_variable('d_b4', [1], initializer=tf.constant_initializer(0))
        d4 = tf.matmul(d3, d_w4) + d_b4

        # 最后輸出一個非尺度化的值
        return d4

Generator network


生成器根據輸入的隨機的d維向量,最終輸出一個28 x 28圖像(實際用784維向量表示)。在生成器的每層將會使用到ReLU激活函數和batch normalization。
batch normalization 可能會有兩個好處:更快的訓練速度和更高的全局准確率。

def generator(z, batch_size, z_dim):
    g_w1 = tf.get_variable('g_w1', [z_dim, 3136], dtype=tf.float32, 
                           initializer=tf.truncated_normal_initializer(stddev=0.02))
    g_b1 = tf.get_variable('g_b1', [3136], initializer=tf.truncated_normal_initializer(stddev=0.02))
    g1 = tf.matmul(z, g_w1) + g_b1
    g1 = tf.reshape(g1, [-1, 56, 56, 1])
    g1 = tf.contrib.layers.batch_norm(g1, epsilon=1e-5, scope='bn1')
    g1 = tf.nn.relu(g1)

    g_w2 = tf.get_variable('g_w2', [3, 3, 1, z_dim/2], dtype=tf.float32, 
                       initializer=tf.truncated_normal_initializer(stddev=0.02))
    g_b2 = tf.get_variable('g_b2', [z_dim/2], initializer=tf.truncated_normal_initializer(stddev=0.02))
    g2 = tf.nn.conv2d(g1, g_w2, strides=[1, 2, 2, 1], padding='SAME')
    g2 = g2 + g_b2
    g2 = tf.contrib.layers.batch_norm(g2, epsilon=1e-5, scope='bn2')
    g2 = tf.nn.relu(g2)
    g2 = tf.image.resize_images(g2, [56, 56])

    g_w3 = tf.get_variable('g_w3', [3, 3, z_dim/2, z_dim/4], dtype=tf.float32, 
                       initializer=tf.truncated_normal_initializer(stddev=0.02))
    g_b3 = tf.get_variable('g_b3', [z_dim/4], initializer=tf.truncated_normal_initializer(stddev=0.02))
    g3 = tf.nn.conv2d(g2, g_w3, strides=[1, 2, 2, 1], padding='SAME')
    g3 = g3 + g_b3
    g3 = tf.contrib.layers.batch_norm(g3, epsilon=1e-5, scope='bn3')
    g3 = tf.nn.relu(g3)
    g3 = tf.image.resize_images(g3, [56, 56])

    g_w4 = tf.get_variable('g_w4', [1, 1, z_dim/4, 1], dtype=tf.float32, 
                       initializer=tf.truncated_normal_initializer(stddev=0.02))
    g_b4 = tf.get_variable('g_b4', [1], initializer=tf.truncated_normal_initializer(stddev=0.02))
    g4 = tf.nn.conv2d(g3, g_w4, strides=[1, 2, 2, 1], padding='SAME')
    g4 = g4 + g_b4
    g4 = tf.sigmoid(g4)
    
    # 輸出g4的維度: batch_size x 28 x 28 x 1
    return g4

Training a GAN

# 清除默認圖的堆棧,並設置全局圖為默認圖
tf.reset_default_graph()
batch_size = 50

z_placeholder = tf.placeholder(tf.float32, [None, z_dimensions], name='z_placeholder') 

x_placeholder = tf.placeholder(tf.float32, shape = [None,28,28,1], name='x_placeholder') 

Gz = generator(z_placeholder, batch_size, z_dimensions) 
Dx = discriminator(x_placeholder) 
Dg = discriminator(Gz, reuse_variables=True)

#discriminator 的loss 分為兩部分
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = Dx, labels = tf.ones_like(Dx)))
d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = Dg, labels = tf.zeros_like(Dg)))
d_loss=d_loss_real + d_loss_fake  
# Generator的目標是生成盡可能真實的圖像,所以計算Dg和1的loss
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = Dg, labels = tf.ones_like(Dg)))  

上面計算了loss 函數,接下來需要定義優化器optimizer。generator的optimizer只更新generator的網絡權值,訓練discriminator的時候需要固定generator的網絡權值同時更新discriminator的權值。

tvars = tf.trainable_variables()

#分別保存discriminator和generator的權值
d_vars = [var for var in tvars if 'd_' in var.name]
g_vars = [var for var in tvars if 'g_' in var.name]

print([v.name for v in d_vars])
print([v.name for v in g_vars])

Adam是GAN的最好的優化方法,它利用了自適應學習率和學習慣性。調用Adam's minimize function來尋找最小loss,並且通過var_list來指定需要更新的參數。


d_trainer = tf.train.AdamOptimizer(0.0003).minimize(d_loss, var_list=d_vars)
g_trainer = tf.train.AdamOptimizer(0.0001).minimize(g_loss, var_list=g_vars)

使用TensorBoard來觀察訓練情況,打開terminal輸入
tensorboard --logdir=tensorboard/

打開TensorBoard的地址是http://localhost:6006

tf.get_variable_scope().reuse_variables()

tf.summary.scalar('Generator_loss', g_loss)
tf.summary.scalar('Discriminator_loss_real', d_loss_real)
tf.summary.scalar('Discriminator_loss_fake', d_loss_fake)

images_for_tensorboard = generator(z_placeholder, batch_size, z_dimensions)
tf.summary.image('Generated_images', images_for_tensorboard, 5)
merged = tf.summary.merge_all()
logdir = "tensorboard/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + "/"
writer = tf.summary.FileWriter(logdir, sess.graph)

下面進行迭代更新參數。對discriminator先進行預訓練,這樣對generator的訓練有好處。

sess = tf.Session()
sess.run(tf.global_variables_initializer())

# 對discriminator的預訓練
for i in range(300):
    z_batch = np.random.normal(0, 1, size=[batch_size, z_dimensions])
    real_image_batch = mnist.train.next_batch(batch_size)[0].reshape([batch_size, 28, 28, 1])
    _, __, dLossReal, dLossFake = sess.run([d_trainer_real, d_trainer_fake, d_loss_real, d_loss_fake],
                                           {x_placeholder: real_image_batch, z_placeholder: z_batch})

    if(i % 100 == 0):
        print("dLossReal:", dLossReal, "dLossFake:", dLossFake)

# 交替訓練 generator和discriminator
for i in range(100000):
    real_image_batch = mnist.train.next_batch(batch_size)[0].reshape([batch_size, 28, 28, 1])
    z_batch = np.random.normal(0, 1, size=[batch_size, z_dimensions])

    # 用 real and fake images對discriminator訓練
    _,dLossReal, dLossFake = sess.run([d_trainer,d_loss_real, d_loss_fake],
                                           {x_placeholder: real_image_batch, z_placeholder: z_batch})

    # 訓練 generator
    z_batch = np.random.normal(0, 1, size=[batch_size, z_dimensions])
    _ = sess.run(g_trainer, feed_dict={z_placeholder: z_batch})

    if i % 10 == 0:
        # 更新 TensorBoard 統計
        z_batch = np.random.normal(0, 1, size=[batch_size, z_dimensions])
        summary = sess.run(merged, {z_placeholder: z_batch, x_placeholder: real_image_batch})
        writer.add_summary(summary, i)

    if i % 100 == 0:
        # 每 100 iterations, 輸出一個生成的圖像
        print("Iteration:", i, "at", datetime.datetime.now())
        z_batch = np.random.normal(0, 1, size=[1, z_dimensions])
        generated_images = generator(z_placeholder, 1, z_dimensions)
        images = sess.run(generated_images, {z_placeholder: z_batch})
        plt.imshow(images[0].reshape([28, 28]), cmap='Greys')
        plt.show()
        # 輸出discriminator的值
        im = images[0].reshape([1, 28, 28, 1])
        result = discriminator(x_placeholder)
        estimate = sess.run(result, {x_placeholder: im})
        print("Estimate:", estimate)

More

眾所周知,由於GAN的表達能力非常強,幾乎能夠刻畫任意概率分布,GAN的訓練過程是非常困難的(容易跑偏)。如果沒有找到合適的超參和網絡結構,並且進行合理的訓練過程,容易在discriminator和generator中間出現一方壓倒另一方的情況。
一種常見失敗情況是discriminator壓倒generator的時候,對generator生成的每個image,discriminator幾乎都能認為是fake image,這時generator幾乎找不到下降的梯度。因此對discriminator的輸出並沒有經過sigmoid 函數(sigmoid function 會將輸出推向0或1)。
另一種失敗情況是“mode collapse”,指的是generator發現並利用了discriminator某些漏洞。例如generator發現某個圖像a能讓discriminator判定為真,那么generator可能會學習到:對任意輸入的noise vector z,只需要輸出和a幾乎相同的圖像。
研究人員已經指出了一部分對建立更加穩定的GAN有幫助的GAN hacks

Resources

Ian Goodfellow 最近的GAN教程
...


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM