論文筆記之：Generative Adversarial Text to Image Synthesis

本文轉載自查看原文 2016-10-31 13:17 2563 深度學習/ Generative Adversarial Networks/ 產生式深度對抗網絡

Generative Adversarial Text to Image Synthesis

ICML 2016

　　摘要：本文將文本和圖像練習起來，根據文本生成圖像，結合 CNN 和 GAN 來有效的進行無監督學習。

　　Attribute Representation: 是一個非常具有意思的方向。由圖像到文本，可以看做是一個識別問題；從文本到圖像，則不是那么簡單。

　　因為需要解決這兩個小問題：

　　1. learning a text feature representation that captures the important visual deatails ;

　　2. use these features to synthesize a compelling image that a human might mistake for real.

　　幸運的是，深度學習對這兩個問題都有了較好的解決方案，即：自然語言表示 和 image synthesis 。

　　但是，仍然存在的一個問題是：the distribution of images conditioned on a text description is highly multimodal，in the sense that there are very many plausible configurations of pixels that correctly illustrate the description.

　　Background ：

　　1. GANs.

　　　　此處略，參考相關博客。

　　2. Deep symmetric structured joint embedding.

　　為了得到一個視覺上可以判別的文本表示（text description），我們采用了一個 CVPR 2016 的一篇文章，利用 CNN 和 recurrent text encoder 根據一張 Image 學一個對應的函數。這個 text classifier 是通過以下的 structure loss 進行訓練：

　　其中，$\{ v_n, t_n, y_n \}$ 是訓練數據集合, $\delta$ 是 0-1 loss，$v_n$ 是image，$t_n$ 是 text description，$y_n$ 是class label。

　　分類器 $f_t$, $f_v$ 參數化如下：

　　其中，一個是 image encoder，一個是 text encoder。當一張圖像有了其類別信息的時候，文本的編碼應該有更高的兼容性得分，反之亦然。(The intuition here is that a text encoding should have a higher compatibility score with image of the corresponding class and vice-versa。)

　　Method :

　　我們的方法是為了基於text feature，訓練一個深度卷積產生式對抗網絡 (DC-GAN)。

　　1. Network architecture .

　　基本概念：產生器 G ；判別器 D ;

　　以上，就是本文提出的整個網絡框架。

　　首先看產生器 G，將文本信息經過預處理得到其特征表達，然后將其和 noise vector 組合在一起，輸入到接下來的反卷積網絡中，最終生成一幅圖像；

　　再看判別器，將圖像進行卷積操作后，我們將本文信息在 depth 方向上組合原本圖像卷積得到的feature 上，然后得到一個二元值。

　　2. Matching-aware discriminator (GAN-CLS) :

　　最直接的方法進行 conditional GAN 的訓練是將 pairs (text, image) 看做是一個聯合的觀察（Joint Observations），然后訓練判別器來判斷這個 pair 是 real or false。這種條件是 naive 的，當處於 the discriminator 沒有明顯的 notion 是否 real training images match the text embedding context。

　　在 naive GAN，the discriminator 觀察到兩種輸入：real image 和匹配的 text；以及 synthetic images 和隨意的 text。所以，必須顯示的將兩種 errors 分開：

　　unrealistic images （for any text）， and realistic images of the wrong class that mismatch the conditioning information。

　　基於這可能會增加了學習 dynamics 的復雜性，我們修改了 GAN 訓練來分開這些 error source。

　　除了在訓練階段，提供 real / fake inputs 給 discriminator 之外，我們增添了第三種輸入，即：real images with mismatched text，which the discriminator must learn to score as fake。通過學習 image / text 的 matching，還要學習 image realism （圖像的真實性），判別器可以提供額外的信息給產生器（the discriminator can provide an additional signal to the generator）。

　　算法 1 總結了訓練的過程。

　　3. Learning with manifold interpolation (GAN-INT) 流型插值

　　Deep network have been shown to learn representations in which interpolations between embedding pairs tend to be near the data manifold.

　　深度學習發現當接近數據流型的數據對之間進行插值來學習表示。

　　受到這個發現的啟發：我們可以產生一個 large amount of additional text embeddings by simply interpolating between embeddings of training set captions。

　　關鍵是，這些插值的 text embeddings 不需要對應上任何真實的 human-written text，所以，不需要額外的 labeling cost。

　　這個就可以看做是：在產生器的目標中增加一個額外的項：

　　由於插值的 embeddings 是偽造的，判別器並沒有對應的 image and text pairs 來進行訓練。但是，D 學習到了是否當前 image 和 text 相匹配。

　　4. Inverting the generator for style transfer.

　　如果 text encoding 可捕獲圖像的 content，比如：flower shape 和 colors，然后為了保證一個真實的圖像，the noise sample Z 應該可以捕獲 style factors，如：背景顏色和姿態。有了一個 trained GAN，我們可能希望轉換一個圖像的類型，根據特定的文本描述的內容。為了達到這個目的，我們可以訓練一個 CNN 來翻轉 G 以使得從樣本進行回歸到 Z。我們利用一個簡單的 squared loss 來訓練 style encoder：

　　其中，S 是 style encoder network。有了訓練的產生器和類型編碼，style transfer 根據樣本 t 從一張 query image x 執行下列步驟：

　　其中， x 是結果圖像， s 是預測的 style。

　　Experiments .

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks 論文筆記論文筆記之：Generative Adversarial Nets 《StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation》論文筆記《FusionGAN: A generative adversarial network for infrared and visible image fusion》論文筆記論文筆記之：Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks 論文筆記之：SeqGAN: Sequence generative adversarial nets with policy gradient 論文筆記之：Semi-Supervised Learning with Generative Adversarial Networks 《Image-to-Image Translation with Conditional Adversarial Networks》論文筆記 AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks 筆記【論文閱讀筆記】《Conditional Generative Adversarial Nets》