1. GAN簡介
最近幾年,深度神經網絡在圖像識別、語音識別以及自然語言處理方面的應用有了爆炸式的增長,並且都達到了極高的准確率,某些方面甚至超過了人類的表現。然而人類的能力遠超出圖像識別和語音識別的任務,像很多需要創造力的任務卻是機器很難做到的。但是GAN使得機器解決這些任務成為可能。
深度學習的領軍人物Yann LeCun曾經說過:
生成對抗網絡(GAN)及其變種已經成為最近10年以來機器學習領域最重要的思想。
為了能更好的了解GAN,做一個比喻,想象一下制作偽鈔的犯罪嫌疑人和警察這個現實中的例子:
- 想要成為一名成功的假鈔制作者,犯罪嫌疑人需要蒙騙得了警察,使得警察無法區分出哪一張是假鈔、哪一張是真鈔。
- 作為警察,需要盡可能高效地發現那些是假鈔
整個過程被稱為對抗性過程(adversarial process)GAN是由Ian Goodfellow 於2014年提出,它是一種兩個神經網絡相互競爭的特殊對抗過程。第一個網絡生成數據,第二個網絡試圖區分真實數據與第一個網絡創造出來的假數據。第二個網絡會生成一個在[0, 1]范圍內的標量,代表數據是真是數據的概率。
2.GAN的目的
GAN是生成模型的一種,主要在模型的分布中生成樣本,它只能夠制造數據而不是提供一個預測的密度函數。
下面是一些學習生成模型的理由:
-
- 生成樣本,這是最直接的理由。
- 訓練並不包含最大似然估計。
- 由於生成器不會看到訓練數據,過擬合風險更低。
- GAN十分擅長捕獲模式的分布。
3.GAN的組成
GAN的計算流程與結構如圖 所示。
GAN包含兩個部分,即生成器generative和判別器discriminative。以生成圖片為例,生成器主要用於學習真實圖像分布從而讓自身生成的圖像更加真實,使得判別器分辨不出生成的數據是否是真實數據。判別器則需要對接受到的圖片進行真假判別。整個過程可以看作是生成器和判別器的博弈,隨着時間的推移,最終兩個網絡達到一個動態均衡:生成器生成的圖像近似於真實圖像分布,而判別器對給定圖像的判別概率約為0.5,相當於盲猜。
假設真實數據data分布為,生成器G學習到的數據分布為,z為隨機噪聲,為噪聲分布,為生成映射函數,將這個隨機噪聲轉化為數據x,為判別映射函數,輸出是判別x來自真實數據data而不是生成數據的概率。訓練判別器D使得判別概率最大化,同時,訓練生成器G最小化,這個優化過程可以被歸結於一個‘二元極小極大博弈’(two-player minimax game),目標函數被定義如下:
從判別器D的角度,D希望它自己能夠盡可能地判別出真實數據和生成數據,即使得D(x)盡可能的達,D(G(z))盡可能的小,即V(D,G)盡可能的大。從生成器G的角度來說,G希望自己生成的數據盡可能地接近於真實數據,也就是希望D(G(z))盡可能地大,D(x)盡可能的小,即V(D,G)盡可能的小。兩個模型相互對抗,最后達到全局最優。
4.DCGAN的實現
GAN出來后很多相關的應用和方法都是基於DCGAN的結構,DCGAN即”Deep Convolution GAN”,通常會有一些約定俗成的規則:
-
在Discriminator和generator中大部分層都使用batch normalization,而在最后一層時通常不會使用batch normalizaiton,目的 是為了保證模型能夠學習到數據的正確的均值和方差;
-
因為會從random的分布生成圖像,所以一般做需要增大圖像的空間維度時如77->1414, 一般會使用strdie為2的deconv(transposed convolution);
-
通常在DCGAN中會使用Adam優化算法而不是SGD。
實現結果大概是這樣的:
4.1導入數據:

import os import sys import tensorflow as tf from tensorflow import logging from tensorflow import gfile import pprint import pickle import numpy as np import random import math from PIL import Image from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST_data/', one_hot = True) output_dir = './local_run' if not gfile.Exists(output_dir): gfile.MakeDirs(output_dir) def get_default_params(): """設置默認參數""" return tf.contrib.training.HParams( z_dim = 100, init_conv_size = 4, g_channels = [128, 64, 32, 1], d_channels = [32, 64, 128, 256], batch_size = 128, learning_rate = 0.002, beta1 = 0.5, img_size = 32, ) hps = get_default_params() class MnistData(object): """Mnist數據集預處理""" def __init__(self, mnist_train, z_dim, img_size): self._data = mnist_train self._example_num = len(self._data) self._z_data = np.random.standard_normal((self._example_num, z_dim)) self._indicator = 0 self._resize_mnist_img(img_size) self._random_shuffle() def _random_shuffle(self): """打亂數據集所有圖片,使圖片數據隨機分布""" p = np.random.permutation(self._example_num) self._z_data = self._z_data[p] self._data = self._data[p] def _resize_mnist_img(self, img_size): """ Resize mnist image to goal img_size. 1. numpy -> PIL img 2. PIL img -> resize 3. PIL img -> numpy """ data = np.asarray(self._data * 255, np.uint8) data = data.reshape((self._example_num, 1, 28, 28)) # [example_num, 784] - > [example_num, 28, 28] data = data.transpose((0, 2, 3, 1)) new_data = [] for i in range(self._example_num): img = data[i].reshape((28, 28)) img = Image.fromarray(img) img = img.resize((img_size, img_size)) img = np.asarray(img) img = img.reshape((img_size, img_size, 1)) new_data.append(img) new_data = np.asarray(new_data, dtype=np.float32) new_data = new_data / 127.5 - 1 # self._data: [num_example, img_size, img_size, 1] self._data = new_data def next_batch(self, batch_size): """使用mini-batch的方法加載數據集""" end_indicator = self._indicator + batch_size if end_indicator > self._example_num: self._random_shuffle() self._indicator = 0 end_indicator = self._indicator + batch_size assert end_indicator < self._example_num batch_data = self._data[self._indicator: end_indicator] batch_z = self._z_data[self._indicator: end_indicator] self._indicator = end_indicator return batch_data, batch_z mnist_data = MnistData(mnist.train.images, hps.z_dim, hps.img_size) batch_data, batch_z = mnist_data.next_batch(5)
4.2定義模型

def conv2d_transpose(inputs, out_channel, name, training, with_bn_relu=True): """將生成器要使用的卷積層函數打包,增加batch normalization層""" with tf.variable_scope(name): conv2d_trans = tf.layers.conv2d_transpose(inputs, out_channel, [5, 5], strides=(2,2), padding='SAME') if with_bn_relu: bn = tf.layers.batch_normalization(conv2d_trans, training=training) relu = tf.nn.relu(bn) return relu else: return conv2d_trans def conv2d(inputs, out_channel, name, training): """將判別器要使用的卷積層函數打包,使用Leaky_relu激活函數""" def leaky_relu(x, leak = 0.2, name = ''): return tf.maximum(x, x * leak, name = name) with tf.variable_scope(name): conv2d_output = tf.layers.conv2d(inputs, out_channel, [5, 5], strides = (2, 2), padding = 'SAME') bn = tf.layers.batch_normalization(conv2d_output, training = training) return leaky_relu(bn, name = 'outputs')
class Generator(object): """生成器""" def __init__(self, channels, init_conv_size): assert len(channels) > 1 self._channels = channels self._init_conv_size = init_conv_size self._reuse = False def __call__(self, inputs, training): inputs = tf.convert_to_tensor(inputs) with tf.variable_scope('generator', reuse=self._reuse): with tf.variable_scope('inputs'): fc = tf.layers.dense( inputs, self._channels[0] * self._init_conv_size * self._init_conv_size) conv0 = tf.reshape(fc, [-1, self._init_conv_size, self._init_conv_size, self._channels[0]]) bn0 = tf.layers.batch_normalization(conv0, training=training) relu0 = tf.nn.relu(bn0) deconv_inputs = relu0 # deconvolutions * 4 for i in range(1, len(self._channels)): with_bn_relu = (i != len(self._channels) - 1) deconv_inputs = conv2d_transpose(deconv_inputs, self._channels[i], 'deconv-%d' % i, training, with_bn_relu) img_inputs = deconv_inputs with tf.variable_scope('generate_imgs'): # imgs value scope: [-1, 1] imgs = tf.tanh(img_inputs, name='imgaes') self._reuse=True self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='generator') return imgs
class Discriminator(object): """判別器""" def __init__(self, channels): self._channels = channels self._reuse = False def __call__(self, inputs, training): inputs = tf.convert_to_tensor(inputs, dtype=tf.float32) conv_inputs = inputs with tf.variable_scope('discriminator', reuse = self._reuse): for i in range(len(self._channels)): conv_inputs = conv2d(conv_inputs, self._channels[i], 'deconv-%d' % i, training) fc_inputs = conv_inputs with tf.variable_scope('fc'): flatten = tf.layers.flatten(fc_inputs) logits = tf.layers.dense(flatten, 2, name="logits") self._reuse = True self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='discriminator') return logits
4.3設定損失函數
論文中的解釋如下:
由於tensorflow只能做minimize,loss function可以寫成如下:
D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))
另外一種寫法是利用tensorflow自帶的tf.nn.sigmoid_cross_entropy_with_logits
函數:
D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=D_logit_real, labels=tf.ones_like(D_logit_real))) D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake))) D_loss = D_loss_real + D_loss_fake G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=D_logit_fake, labels=tf.ones_like(D_logit_fake)))
4.4建立模型
class DCGAN(object): """建立DCGAN模型""" def __init__(self, hps): g_channels = hps.g_channels d_channels = hps.d_channels self._batch_size = hps.batch_size self._init_conv_size = hps.init_conv_size self._batch_size = hps.batch_size self._z_dim = hps.z_dim self._img_size = hps.img_size self._generator = Generator(g_channels, self._init_conv_size) self._discriminator = Discriminator(d_channels) def build(self): self._z_placholder = tf.placeholder(tf.float32, (self._batch_size, self._z_dim)) self._img_placeholder = tf.placeholder(tf.float32, (self._batch_size, self._img_size, self._img_size, 1)) generated_imgs = self._generator(self._z_placholder, training = True) fake_img_logits = self._discriminator(generated_imgs, training = True) real_img_logits = self._discriminator(self._img_placeholder, training = True) loss_on_fake_to_real = tf.reduce_mean( tf.nn.sparse_softmax_cross_entropy_with_logits( labels = tf.ones([self._batch_size], dtype = tf.int64), logits = fake_img_logits)) loss_on_fake_to_fake = tf.reduce_mean( tf.nn.sparse_softmax_cross_entropy_with_logits( labels = tf.zeros([self._batch_size], dtype = tf.int64), logits = fake_img_logits)) loss_on_real_to_real = tf.reduce_mean( tf.nn.sparse_softmax_cross_entropy_with_logits( labels = tf.ones([self._batch_size], dtype = tf.int64), logits = real_img_logits)) tf.add_to_collection('g_losses', loss_on_fake_to_real) tf.add_to_collection('d_losses', loss_on_fake_to_fake) tf.add_to_collection('d_losses', loss_on_real_to_real) loss = { 'g': tf.add_n(tf.get_collection('g_losses'), name = 'total_g_loss'), 'd': tf.add_n(tf.get_collection('d_losses'), name = 'total_d_loss') } return (self._z_placholder, self._img_placeholder, generated_imgs, loss) def build_train(self, losses, learning_rate, beta1): g_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, beta1 = beta1) d_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, beta1 = beta1) g_opt_op = g_opt.minimize(losses['g'], var_list = self._generator.variables) d_opt_op = d_opt.minimize(losses['d'], var_list = self._discriminator.variables) with tf.control_dependencies([g_opt_op, d_opt_op]): return tf.no_op(name = 'train') dcgan = DCGAN(hps) z_placeholder, img_placeholder, generated_imgs, losses = dcgan.build() train_op = dcgan.build_train(losses, hps.learning_rate, hps.beta1)
4.5訓練模型
#開始訓練 init_op = tf.global_variables_initializer() train_steps = 10000 with tf.Session() as sess: sess.run(init_op) for step in range(train_steps): batch_img, batch_z = mnist_data.next_batch(hps.batch_size) fetches = [train_op, losses['g'], losses['d']] should_sample = (step + 1) % 50 == 0 if should_sample: fetches += [generated_imgs] out_values = sess.run(fetches, feed_dict = { z_placeholder: batch_z, img_placeholder: batch_img }) _, g_loss_val, d_loss_val = out_values[0:3] logging.info('step: %d, g_loss: %4.3f, d_loss: %4.3f' % (step, g_loss_val, d_loss_val)) if should_sample: gen_imgs_val = out_values[3] gen_img_path = os.path.join(output_dir, '%05d-gen.jpg' % (step + 1)) gt_img_path = os.path.join(output_dir, '%05d-gt.jpg' % (step + 1)) gen_img = combine_and_show_imgs(gen_imgs_val, hps.img_size) gt_img = combine_and_show_imgs(batch_img, hps.img_size) print(gen_img_path) print(gt_img_path) gen_img.save(gen_img_path) gt_img.save(gt_img_path)
4.6顯示保存的圖片
def combine_and_show_imgs(batch_imgs, img_size, rows=8, cols=16): """連接圖片,組成一個網格圖片""" # batch_imgs: [batch_size, img_size, img_size, 1] result_big_img = [] for i in range(rows): row_imgs = [] for j in range(cols): img = batch_imgs[cols * i + j] img = img.reshape((img_size, img_size)) img = (img + 1) * 127.5 row_imgs.append(img) row_imgs = np.hstack(row_imgs) result_big_img.append(row_imgs) result_big_img = np.vstack(result_big_img) result_big_img = np.asarray(result_big_img, np.uint8) result_big_img = Image.fromarray(result_big_img) return result_big_img
下面是分別經過10000次訓練后生成的結果:
5.Summary
訓練GAN本質上是生成器網絡G(z)和判別起網絡D(z)相互競爭並達到最優,生成器和判別器最終達到了一個如果對方不改變就無法進一步提升的狀態。理想情況下,我們希望兩個網絡以同樣的速率同時進行改善。判別器最理想的損失接近於0.5,在這個情況下對於判別器來說其無法從真實圖像中區分出生成的圖像。
為了克服訓練GAN模型中的問題,下面使一些常用的方法:
- 特征匹配
- Mini-batch
- 滑動平均
- 單側標簽平滑
- 輸入規范化
- 批規范化
- 利用Relu和MaxPool避免稀疏梯度
- 優化器和噪聲
- 不要僅根據統計信息平衡損失
以上為學習GAN的記錄和總結
reference:
[1] GAN實戰生成對抗網絡 [美] Kuntal Ganguly著
[2] imooc 深度學習實戰CNN RNN GAN
[3] https://blog.csdn.net/u012223913/article/details/75051516