風格遷移算法

本文轉載自查看原文 2019-09-23 11:22 3716 Python/ TensorFlow

最近推導了一些機器學習入門的算法，老是搞那些數學知識，搞的自己都沒信心和新區了。今天學着玩點有趣好玩的。

圖像的藝術風格遷移算法，算是一個簡單有趣，而且一般人都能看得到效果的算法。圖像藝術風格遷移，簡單的理解，就是找一個照片作為內容，然后把這個照片換成梵高或者畢加索等制定的風格。關於圖像藝術風格遷移的一些歷史和知識，大家可以看看這篇文章：圖像風格遷移(Neural Style)簡史。

思路

風格遷移的大概思路是：我們需要准備兩張圖片。一張是我們將要輸出的內容圖片，另外一張是我們需要模仿的風格圖片。我們需要輸出一張圖片，讓輸出的這張圖片的內容和內容圖片相近，讓輸出圖片的風格和風格圖片的風格相近。

內容最接近的算法

內容最接近，相對來說比較簡單。簡單的理解可以對比每個圖片的像素，然后計算他們的差距。也可以是計算CNN中間某個卷積層得到的特征值之間的距離。

我經過調試發現，如果內容圖層取得太靠前，效果不太好。因為內容特征越靠前，相當於對比的越細膩，而風格遷移要得到的效果是宏觀上更像內容圖片，細節上用風格表現，這樣效果最好。

風格最接近的算法

風格的比較是最難理解的。要理解風格比較的算法，需要了解一個名詞叫做格拉姆矩陣。聽常博士說這個知識屬於矩陣分析里面的內容。我對這方面沒系統學習過，不太懂。但是我能理解到的層次是：你給定N個卷積核，然后可以得到N個矩陣。把這N個矩陣拉直了形成N個向量，N個向量兩兩內積形成的矩陣，就是格萊姆矩陣。而生成圖片和風格圖片的格萊姆矩陣的距離差，就是風格差。

實現辦法

那么最終實現的辦法，就是我們生成一張圖片，然后得到一個損失函數\(loss=contentloss+styleloss\)，然后我們用梯度下降讓損失函數最小就可以了。

具體的實現

在看我的代碼之前，我參考了tensorflow的官網的算法,這個算法的實現用的是tf最新的API，好處是簡單，壞處是封裝的太死了，太簡單了，很多底層的東西看不到。我想用比較笨的辦法。可以順便學習下VGG19。但是總體思路差不多。

VGG19的實現

首先我下載一個vgg19的模型，並且簡單實現了vgg實際加載模型和計算卷積的過程，我把其中全連接層給刪除了。因為卷積是共享參數的，所以輸入的圖表不一定要和VGG19圖像一樣，但是全連接層會一樣。

import tensorflow as tf
import scipy.io
import numpy as np
import cv2

DEFAULT_PATH ='E:\\project\\ChangeStyle\\model\\imagenet-vgg-verydeep-19.mat'
VGG19_LAYERS=('conv1_1','relu1_1','conv1_2','relu1_2','pool1',
              'conv2_1','relu2_1','conv2_2','relu2_2','pool2',
              'conv3_1','relu3_1','conv3_2','relu3_2','conv3_3','relu3_3','conv3_4','relu3_4','pool3',
              'conv4_1','relu4_1','conv4_2','relu4_2','conv4_3','relu4_3','conv4_4','relu4_4','pool4',
              'conv5_1','relu5_1','conv5_2','relu5_2','conv5_3','relu5_3','conv5_4','relu5_4','pool5')
              #,'fc6','relu6','fc7','relu7','fc8','softmax'
    
VGG19_index_Map = {'conv1_1':0,'conv1_2':2,'conv2_1':5,'conv2_2':7,'conv3_1':10,'conv3_2':12,'conv3_3':14,
                   'conv3_4':16,'conv4_1':19,'conv4_2':21,'conv4_3':23,
            'conv4_4':25,'conv5_1':28,'conv5_2':30,'conv5_3':32,'conv5_4':34,'fc6':37,'fc7':39,'fc8':41}

class VGG19:
    
    
    def __init__(self, model_path = None):
        layers = []
        if model_path == None:
            layers = scipy.io.loadmat(DEFAULT_PATH)
        else:
            layers = scipy.io.loadmat(model_path)
        assert layers != None
        self.vgg_layers = layers['layers'][0]


    def _compute_(self, layer_name, input):
        output = []

        w = []
        b = []
        if VGG19_index_Map.__contains__(layer_name):
            i = VGG19_index_Map[layer_name]
            w = self.vgg_layers[i][0][0][0][0][0]
            b = self.vgg_layers[i][0][0][0][0][1]
            
        type = layer_name[:3]
        if type == 'con':
            output = tf.nn.conv2d(input,w,strides=[1,1,1,1],padding='SAME')
            output = tf.add(output, b)
        elif type == 'rel':
            output = tf.nn.relu(input)
        elif type == 'poo':
            output = tf.nn.max_pool(input,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
        elif type == 'fc6':
            b = np.reshape(b,-1)
            input = tf.reshape(input,[input.shape[0],-1])
            w = tf.reshape(w,(-1,w.shape[-1]))
            output = tf.nn.bias_add(tf.matmul(input,w),b)
        elif type == 'fc7':
            w = np.reshape(w,(-1,w.shape[-1]))
            output = tf.add(tf.matmul(input,w),b)
        elif type == 'fc8':
            w = np.reshape(w,(-1,w.shape[-1]))
            output = tf.add(tf.matmul(input,w),b)
        else:
            output = tf.nn.softmax(input)
        return output

    
    def build_model(self, image):
        
        net={}
        net['input'] = image

        input = image
        for layer in VGG19_LAYERS:
            input = self._compute_(layer,input)
            net[layer] = input
        
        return net

上述代碼基本上比較簡單，我個人感覺不需要怎么解釋。

我們有了vgg19的代碼架構以后，我們需要的是可以實現圖像的可以通過vgg19以后得到的卷積后的數值，同時可以計算他的數值

IMAGE_SIZE = 512
feature_layers_w = [0.1,0.1,0.4,0.3,0.1]
STYLE_LAYERS =['conv1_1','conv2_1','conv3_1','conv4_1','conv5_1']
CONTENT_LAYERS =['conv5_2']

import tensorflow as tf
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt


def get_content_loss(p, x):
    
    loss = tf.reduce_mean(tf.pow(p - x,2))
    return loss

def gram_matrix(input_tensor):
    channels = int(input_tensor.shape[-1])
    a = tf.reshape(input_tensor, [-1, channels])
    n = tf.shape(a)[0]
    gram = tf.matmul(a, a, transpose_a=True) / tf.to_float(n)
    return gram

def get_style_loss(base_style, gram_target, index):
    gram_style = gram_matrix(base_style)
    gram_target = gram_matrix(gram_target)
    
    return feature_layers_w[index] * tf.reduce_mean(tf.pow(gram_style - gram_target,2))

def get_compute_loss(genarate,content, style):
    
    c_loss = 0
    s_loss = 0

    for i,s_name in enumerate(STYLE_LAYERS):
        g_data = genarate[s_name]
        s_data = style[s_name]

        g_data = tf.reshape(g_data,(-1,g_data.shape[3]))
        s_data = tf.reshape(s_data,(-1,s_data.shape[3]))
        
        s_loss = s_loss + get_style_loss(s_data,g_data,i)
    
    for c_name in CONTENT_LAYERS:
        g_data = genarate[c_name]
        c_data = content[c_name]

        g_data = tf.reshape(g_data,(-1,g_data.shape[3]))
        c_data = tf.reshape(c_data,(-1,c_data.shape[3]))

        c_loss = c_loss + get_content_loss(c_data, g_data)

    return 1e-2 * s_loss/ tf.to_float(tf.size(STYLE_LAYERS)) + 1e3 *c_loss

import tensorflow as tf
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt


def deprocess_img(processed_img):
    x = processed_img.copy()
    if len(x.shape) == 4:
        x = np.squeeze(x, 0)
    assert len(x.shape) == 3, ("Input to deprocess image must be an image of "
                             "dimension [1, height, width, channel] or [height, width, channel]")
    if len(x.shape) != 3:
        raise ValueError("Invalid input to deprocessing image")
  
    #perform the inverse of the preprocessiing step
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x

def load_img(img_path):
    img = Image.open(img_path)
    img = img.resize((IMAGE_SIZE,IMAGE_SIZE))
    img = img.tobytes()
    img = tf.decode_raw(img,tf.uint8)
    img = tf.cast(img,tf.float32)
    img = tf.reshape(img,(1,IMAGE_SIZE,IMAGE_SIZE,3))
    img = tf.keras.applications.vgg19.preprocess_input(img)
    return img





content_img = load_img('E:\\project\\ChangeStyle\\img\\nst\\Tuebingen_Neckarfront.jpg')
style_img = load_img('E:\\project\\ChangeStyle\\img\\nst\\1024px-Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg')
generate_img = tf.Variable(content_img, dtype=tf.float32)

with tf.Session() as sess:
    vgg = VGG19()

    g_model = vgg.build_model(generate_img)
    c_model = sess.run(vgg.build_model(content_img))
    s_model = sess.run(vgg.build_model(style_img))

    loss = get_compute_loss(g_model,c_model,s_model)
    optimizer = tf.train.AdamOptimizer(learning_rate=5, beta1=0.99, epsilon=1e-1)
    op = optimizer.minimize(loss,global_step= tf.train.get_global_step())
    sess.run(tf.global_variables_initializer())
    for i in range(1000):
        _,l = sess.run((op,loss))
        
    img = sess.run(generate_img)
    
    img = deprocess_img(img)
    plt.imshow(img)
    plt.show()

這個代碼有幾個問題：

生成任何一張圖片，需要消耗太多的時間。
我測試通過上述參數，繪制油畫效果就不錯，但是繪制素描效果就比較差，當然可以通過調整參數獲得相對較好的效果。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 keras圖像風格遷移 Keras實現風格遷移圖像風格遷移（Pytorch）風格遷移學習一圖像風格遷移圖像風格遷移原理 TensorFlow從1到2（十三）圖片風格遷移 NLP | 文本風格遷移總結圖像風格轉換（Style Transfer | 風格遷移綜述）風格遷移(1)-格拉姆矩陣