用Tensorflow實現DCGAN

本文轉載自查看原文 2019-03-06 23:04 1064 GAN

1. GAN簡介

最近幾年，深度神經網絡在圖像識別、語音識別以及自然語言處理方面的應用有了爆炸式的增長，並且都達到了極高的准確率，某些方面甚至超過了人類的表現。然而人類的能力遠超出圖像識別和語音識別的任務，像很多需要創造力的任務卻是機器很難做到的。但是GAN使得機器解決這些任務成為可能。

深度學習的領軍人物Yann LeCun曾經說過：

生成對抗網絡（GAN）及其變種已經成為最近10年以來機器學習領域最重要的思想。

為了能更好的了解GAN，做一個比喻，想象一下制作偽鈔的犯罪嫌疑人和警察這個現實中的例子：

想要成為一名成功的假鈔制作者，犯罪嫌疑人需要蒙騙得了警察，使得警察無法區分出哪一張是假鈔、哪一張是真鈔。
作為警察，需要盡可能高效地發現那些是假鈔

整個過程被稱為對抗性過程（adversarial process）GAN是由Ian Goodfellow 於2014年提出，它是一種兩個神經網絡相互競爭的特殊對抗過程。第一個網絡生成數據，第二個網絡試圖區分真實數據與第一個網絡創造出來的假數據。第二個網絡會生成一個在[0, 1]范圍內的標量，代表數據是真是數據的概率。

2.GAN的目的

GAN是生成模型的一種，主要在模型的分布中生成樣本，它只能夠制造數據而不是提供一個預測的密度函數。

下面是一些學習生成模型的理由：

- 生成樣本，這是最直接的理由。
- 訓練並不包含最大似然估計。
- 由於生成器不會看到訓練數據，過擬合風險更低。
- GAN十分擅長捕獲模式的分布。

3.GAN的組成

GAN的計算流程與結構如圖所示。

GAN包含兩個部分，即生成器generative和判別器discriminative。以生成圖片為例，生成器主要用於學習真實圖像分布從而讓自身生成的圖像更加真實，使得判別器分辨不出生成的數據是否是真實數據。判別器則需要對接受到的圖片進行真假判別。整個過程可以看作是生成器和判別器的博弈，隨着時間的推移，最終兩個網絡達到一個動態均衡：生成器生成的圖像近似於真實圖像分布，而判別器對給定圖像的判別概率約為0.5，相當於盲猜。

假設真實數據data分布為，生成器Ｇ學習到的數據分布為，ｚ為隨機噪聲，為噪聲分布，為生成映射函數，將這個隨機噪聲轉化為數據ｘ，為判別映射函數，輸出是判別ｘ來自真實數據data而不是生成數據的概率。訓練判別器Ｄ使得判別概率最大化，同時，訓練生成器Ｇ最小化，這個優化過程可以被歸結於一個‘二元極小極大博弈’（two-player minimax game），目標函數被定義如下：

從判別器Ｄ的角度，Ｄ希望它自己能夠盡可能地判別出真實數據和生成數據，即使得Ｄ(x)盡可能的達，Ｄ(G(z))盡可能的小，即V(D,G)盡可能的大。從生成器Ｇ的角度來說，Ｇ希望自己生成的數據盡可能地接近於真實數據，也就是希望Ｄ(G(z))盡可能地大，Ｄ(x)盡可能的小，即V(D,G)盡可能的小。兩個模型相互對抗，最后達到全局最優。

4.DCGAN的實現

GAN出來后很多相關的應用和方法都是基於DCGAN的結構，DCGAN即”Deep Convolution GAN”，通常會有一些約定俗成的規則：

在Discriminator和generator中大部分層都使用batch normalization，而在最后一層時通常不會使用batch normalizaiton，目的是為了保證模型能夠學習到數據的正確的均值和方差；
因為會從random的分布生成圖像，所以一般做需要增大圖像的空間維度時如77->1414，一般會使用strdie為2的deconv（transposed convolution）；
通常在DCGAN中會使用Adam優化算法而不是SGD。

實現結果大概是這樣的：

4.1導入數據：

import os 
import sys
import tensorflow as tf
from tensorflow import logging
from tensorflow import gfile
import pprint 
import pickle
import numpy as np
import random
import math
from PIL import Image
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data/', one_hot = True)

output_dir = './local_run'
if not gfile.Exists(output_dir):
    gfile.MakeDirs(output_dir)
    
def get_default_params():
    """設置默認參數"""
    return tf.contrib.training.HParams(
    z_dim = 100,
    init_conv_size = 4,
    g_channels = [128, 64, 32, 1],
    d_channels = [32, 64, 128, 256],
    batch_size = 128,
    learning_rate = 0.002,
    beta1 = 0.5,
    img_size = 32,
    )
hps = get_default_params()

class MnistData(object):
    """Mnist數據集預處理"""
    def __init__(self, mnist_train, z_dim, img_size):
        self._data = mnist_train
        self._example_num = len(self._data)
        self._z_data = np.random.standard_normal((self._example_num, z_dim))
        self._indicator = 0
        self._resize_mnist_img(img_size)
        self._random_shuffle()
       
    def _random_shuffle(self):
        """打亂數據集所有圖片，使圖片數據隨機分布"""
        p = np.random.permutation(self._example_num)
        self._z_data = self._z_data[p]
        self._data = self._data[p]
    
    def _resize_mnist_img(self, img_size):
        """
        Resize mnist image to goal img_size.
        1. numpy -> PIL img
        2. PIL img -> resize
        3. PIL img -> numpy
        """
        data = np.asarray(self._data * 255, np.uint8)
        data = data.reshape((self._example_num, 1, 28, 28)) # [example_num, 784] - > [example_num, 28, 28]
        data = data.transpose((0, 2, 3, 1))
        new_data = []
        for i in range(self._example_num):
            img = data[i].reshape((28, 28))
            img = Image.fromarray(img)
            img = img.resize((img_size, img_size))
            img = np.asarray(img)
            img = img.reshape((img_size, img_size, 1))
            new_data.append(img)
        new_data = np.asarray(new_data, dtype=np.float32)
        new_data = new_data / 127.5 - 1
        # self._data: [num_example, img_size, img_size, 1]
        self._data = new_data
        
    def next_batch(self, batch_size):
        """使用mini-batch的方法加載數據集"""
        end_indicator = self._indicator + batch_size
        if end_indicator > self._example_num:
            self._random_shuffle()
            self._indicator = 0
            end_indicator = self._indicator + batch_size
        assert end_indicator < self._example_num
        
        batch_data = self._data[self._indicator: end_indicator]
        batch_z = self._z_data[self._indicator: end_indicator]
        self._indicator = end_indicator
        return batch_data, batch_z
            
mnist_data = MnistData(mnist.train.images, hps.z_dim, hps.img_size)
batch_data, batch_z = mnist_data.next_batch(5)

View Code（數據處理）

4.2定義模型

def conv2d_transpose(inputs, out_channel, name, training, with_bn_relu=True):
    """將生成器要使用的卷積層函數打包，增加batch normalization層"""
    with tf.variable_scope(name):
        conv2d_trans = tf.layers.conv2d_transpose(inputs, 
                                                  out_channel,
                                                  [5, 5],
                                                  strides=(2,2),
                                                  padding='SAME')
        if with_bn_relu:
            bn = tf.layers.batch_normalization(conv2d_trans, training=training)
            relu = tf.nn.relu(bn)
            return relu
        else:
            return conv2d_trans

def conv2d(inputs, out_channel, name, training):
    """將判別器要使用的卷積層函數打包，使用Leaky_relu激活函數"""
    def leaky_relu(x, leak = 0.2, name = ''):
        return tf.maximum(x, x * leak, name = name)
    with tf.variable_scope(name):
        conv2d_output = tf.layers.conv2d(inputs,
                                         out_channel,
                                         [5, 5],
                                         strides = (2, 2),
                                         padding = 'SAME')
        bn = tf.layers.batch_normalization(conv2d_output,
                                           training = training)
        return leaky_relu(bn, name = 'outputs')

View Code（打包需要經常使用的函數）

class Generator(object):
    """生成器"""
    def __init__(self, channels, init_conv_size):
        assert len(channels) > 1
        self._channels = channels
        self._init_conv_size = init_conv_size
        self._reuse = False
    
    def __call__(self, inputs, training):
        inputs = tf.convert_to_tensor(inputs)
        with tf.variable_scope('generator', reuse=self._reuse):
            with tf.variable_scope('inputs'):
                fc = tf.layers.dense(
                    inputs, 
                    self._channels[0] * self._init_conv_size * self._init_conv_size)
                conv0 = tf.reshape(fc, [-1, self._init_conv_size, self._init_conv_size, self._channels[0]])
                bn0 = tf.layers.batch_normalization(conv0, training=training)
                relu0 = tf.nn.relu(bn0)
            
            deconv_inputs = relu0
            # deconvolutions * 4
            for i in range(1, len(self._channels)):
                with_bn_relu = (i != len(self._channels) - 1)
                deconv_inputs = conv2d_transpose(deconv_inputs,
                                                 self._channels[i],
                                                 'deconv-%d' % i,
                                                 training,
                                                 with_bn_relu)
            img_inputs = deconv_inputs
            with tf.variable_scope('generate_imgs'):
                # imgs value scope: [-1, 1]
                imgs = tf.tanh(img_inputs, name='imgaes')
        self._reuse=True
        self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='generator')
        return imgs

class Discriminator(object):
    """判別器"""
    def __init__(self, channels):
        self._channels = channels
        self._reuse = False
    
    def __call__(self, inputs, training):
        inputs = tf.convert_to_tensor(inputs, dtype=tf.float32)
        
        conv_inputs = inputs
        with tf.variable_scope('discriminator', reuse = self._reuse):
            for i in range(len(self._channels)):
                conv_inputs = conv2d(conv_inputs,
                                     self._channels[i],
                                     'deconv-%d' % i,
                                     training)
            fc_inputs = conv_inputs
            with tf.variable_scope('fc'):
                flatten = tf.layers.flatten(fc_inputs)
                logits = tf.layers.dense(flatten, 2, name="logits")
        self._reuse = True
        self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='discriminator')
        return logits

4.3設定損失函數

論文中的解釋如下：

由於tensorflow只能做minimize，loss function可以寫成如下：

D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))

另外一種寫法是利用tensorflow自帶的tf.nn.sigmoid_cross_entropy_with_logits 函數：

D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
    logits=D_logit_real, labels=tf.ones_like(D_logit_real)))
D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
    logits=D_logit_fake, labels=tf.zeros_like(D_logit_fake)))
D_loss = D_loss_real + D_loss_fake
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
    logits=D_logit_fake, labels=tf.ones_like(D_logit_fake)))

4.4建立模型

class DCGAN(object):
    """建立DCGAN模型"""
    def __init__(self, hps):
        g_channels = hps.g_channels
        d_channels = hps.d_channels
        
        self._batch_size = hps.batch_size
        self._init_conv_size = hps.init_conv_size
        self._batch_size = hps.batch_size
        self._z_dim = hps.z_dim
        
        self._img_size = hps.img_size
        
        self._generator = Generator(g_channels, self._init_conv_size)
        self._discriminator = Discriminator(d_channels)
    
    def build(self):
        self._z_placholder = tf.placeholder(tf.float32, (self._batch_size, self._z_dim))
        self._img_placeholder = tf.placeholder(tf.float32, 
                                               (self._batch_size, self._img_size, self._img_size, 1))
        
        generated_imgs = self._generator(self._z_placholder, training = True)
        fake_img_logits = self._discriminator(generated_imgs, training = True)
        real_img_logits = self._discriminator(self._img_placeholder, training = True)
        
        loss_on_fake_to_real = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels = tf.ones([self._batch_size], dtype = tf.int64),
                logits = fake_img_logits))
        loss_on_fake_to_fake = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels = tf.zeros([self._batch_size], dtype = tf.int64),
                logits = fake_img_logits))
        loss_on_real_to_real = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels = tf.ones([self._batch_size], dtype = tf.int64),
                logits = real_img_logits))
        
        tf.add_to_collection('g_losses', loss_on_fake_to_real)
        tf.add_to_collection('d_losses', loss_on_fake_to_fake)
        tf.add_to_collection('d_losses', loss_on_real_to_real)
        
        
        loss = {
            'g': tf.add_n(tf.get_collection('g_losses'), name = 'total_g_loss'),
            'd': tf.add_n(tf.get_collection('d_losses'), name = 'total_d_loss')
        }
        
        return (self._z_placholder, self._img_placeholder, generated_imgs, loss)
    
    def build_train(self, losses, learning_rate, beta1):
        g_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, beta1 = beta1)
        d_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, beta1 = beta1)
        g_opt_op = g_opt.minimize(losses['g'], var_list = self._generator.variables)
        d_opt_op = d_opt.minimize(losses['d'], var_list = self._discriminator.variables)
        with tf.control_dependencies([g_opt_op, d_opt_op]):
            return tf.no_op(name = 'train')


dcgan = DCGAN(hps)
z_placeholder, img_placeholder, generated_imgs, losses = dcgan.build()
train_op = dcgan.build_train(losses, hps.learning_rate, hps.beta1)

4.5訓練模型

#開始訓練
init_op = tf.global_variables_initializer()
train_steps = 10000

with tf.Session() as sess:
    sess.run(init_op)
    for step in range(train_steps):
        batch_img, batch_z = mnist_data.next_batch(hps.batch_size)
        
        fetches = [train_op, losses['g'], losses['d']]
        should_sample = (step + 1) % 50 == 0
        if should_sample:
            fetches += [generated_imgs]
        out_values = sess.run(fetches,
                              feed_dict = {
                                  z_placeholder: batch_z,
                                  img_placeholder: batch_img
                              })
        _, g_loss_val, d_loss_val = out_values[0:3]
        logging.info('step: %d, g_loss: %4.3f, d_loss: %4.3f' % (step, g_loss_val, d_loss_val))
        if should_sample:
            gen_imgs_val = out_values[3]
            
            gen_img_path = os.path.join(output_dir, '%05d-gen.jpg' % (step + 1))
            gt_img_path = os.path.join(output_dir, '%05d-gt.jpg' % (step + 1))
            
            gen_img = combine_and_show_imgs(gen_imgs_val, hps.img_size)
            gt_img = combine_and_show_imgs(batch_img, hps.img_size)
             
            print(gen_img_path)
            print(gt_img_path)
            gen_img.save(gen_img_path)
            gt_img.save(gt_img_path)

4.6顯示保存的圖片

def combine_and_show_imgs(batch_imgs, img_size, rows=8, cols=16):
    """連接圖片，組成一個網格圖片"""
    # batch_imgs: [batch_size, img_size, img_size, 1]
    result_big_img = []
    for i in range(rows):
        row_imgs = []
        for j in range(cols):
            img = batch_imgs[cols * i + j]
            img = img.reshape((img_size, img_size))
            img = (img + 1) * 127.5
            row_imgs.append(img)
        row_imgs = np.hstack(row_imgs)
        result_big_img.append(row_imgs)
    result_big_img = np.vstack(result_big_img)
    result_big_img = np.asarray(result_big_img, np.uint8)
    result_big_img = Image.fromarray(result_big_img)
    return result_big_img

下面是分別經過10000次訓練后生成的結果：

5.Summary

訓練GAN本質上是生成器網絡G(z)和判別起網絡D(z)相互競爭並達到最優，生成器和判別器最終達到了一個如果對方不改變就無法進一步提升的狀態。理想情況下，我們希望兩個網絡以同樣的速率同時進行改善。判別器最理想的損失接近於0.5，在這個情況下對於判別器來說其無法從真實圖像中區分出生成的圖像。

為了克服訓練GAN模型中的問題，下面使一些常用的方法：

特征匹配
Mini-batch
滑動平均
單側標簽平滑
輸入規范化
批規范化
利用Relu和MaxPool避免稀疏梯度
優化器和噪聲
不要僅根據統計信息平衡損失

以上為學習GAN的記錄和總結

reference：

[1] GAN實戰生成對抗網絡 [美] Kuntal Ganguly著

[2] imooc 深度學習實戰CNN RNN GAN

[3] https://blog.csdn.net/u012223913/article/details/75051516

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 talk is cheap, show me the code——dcgan,wgan,wgan-gp的tensorflow實現 DCGAN in Tensorflow生成動漫人物 Tensorflow學習筆記---2--DCGAN代碼學習 DCGAN 【神經網絡與深度學習】DCGAN及其TensorFlow源碼『TensorFlow』DCGAN生成動漫人物頭像_下 CGAN和DCGAN 【pytorch】基於mnist數據集的dcgan手寫數字生成實現 TensorFlow實戰1——TensorFlow實現Autoencoder TensorFlow實戰5——TensorFlow實現AlexNet