Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
發表在2018年CVPR。
摘要:
Despite that convolutional neural networks (CNN) have recently demonstrated high-quality reconstruction for single-image super-resolution (SR), recovering natural and realistic texture remains a challenging problem. In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps. This is made possible through a novel Spatial Feature Transform (SFT) layer that generates affine transformation parameters for spatial-wise feature modulation. SFT layers can be trained end-to-end together with the SR network using the same loss function. During testing, it accepts an input image of arbitrary size and generates a high-resolution image with just a single forward pass conditioned on the categorical priors. Our final results show that an SR network equipped with SFT can generate more realistic and visually pleasing textures in comparison to state-of-the-art SRGAN [27] and EnhanceNet [38].
結論:
We have explored the use of semantic segmentation maps as categorical prior for constraining the plausible solution space in SR. A novel Spatial Feature Transform (SFT) layer has been proposed to efficiently incorporate the categorical conditions into a CNN-based SR network. Thanks to the SFT layers, our SFT-GAN is capable of generating distinct and rich textures for multiple semantic regions in a super-resolved image in just a single forward pass. Extensive comparisons and a user study demonstrate the capability of SFT-GAN in generating realistic and visually pleasing textures, outperforming previous GAN-based methods [27, 38]. Our work currently focuses on SR of outdoor scenes.
Despite robust to out-of-category images, it does not consider priors of finer categories, especially for indoor scenes, e.g., furniture, appliance and silk. In such a case, it puts forward challenging requirements for segmentation tasks from an LR image. Future work aims at addressing these shortcomings. Furthermore, segmentation and SR may benefit from each other and jointly improve the performance.
要點:
-
本文的重點,是在SR時更好地恢復自然紋理信息。
-
具體而言,通過輸入語義分割概率圖(semantic segmentation probability maps),為CNN提供類別先驗(categorical priors),從而讓紋理與類別一一對應。
-
實現該功能的網絡層稱為空域特征轉換層(spatial feature transform layer)。它可以生成對空域特征進行仿射變換的參數,並且與SR網絡一起訓練。
-
盡管SFT-GAN對於未知類別的圖像也是健壯的,但未知類別確實是一個問題。
亮點:
-
這算是一個借助語義分割信息的超分辨工作,思想符合邏輯,實驗效果也好。Fig. 1給出了說明:
-
這種思想還可以拓展到其他先驗,例如深度圖(depth map),從而增強紋理的顆粒度(granularity)。
-
類似於BN,對特征進行正則化,從而置入類別先驗。
局限:
-
語義分割圖是LR圖像經過雙三次插值后,輸入已訓練好的分割網絡[31]得到的,與超分辨網絡獨立。
-
作者通過仿射變換特征的方式,置入類別先驗。這種方式有效果,但可能還有更好的方式。
故事背景
如上圖,如果缺乏對類別的先驗,我們的解空間是很難約束的。特別是對於兩個相似的場景,如上圖的植物和磚塊。
歷史工作中,就有人專門對不同的分類訓練各自的模型。但在這里,作者想讓語義分割圖作為CNN的輸入。關鍵就在於如何輸入。如果只是簡單地輸入分割圖,或者在中間層輸入分割圖,效果是不好的。
空域特征轉換
為了解決語義分割圖的輸入有效性問題,我們引出了空域特征轉換(SFT)層。
實際上,SFT的思想起源於BN。BN是對特征作仿射變換。條件正則化(conditional normalization, CN)則是采用在某條件下學習得到的函數,代替BN中的仿射變換。那么SFT是怎么做的呢?
具體而言,SFT基於先驗,輸出調整參數對(modulation parameter pair)\((\gamma, \beta)\)。該調整參數對將會對中間層的特征\(F\)進行仿射變換:\(SFT(F|\gamma, \beta) = \gamma \odot F + \beta\),其中\(\odot\)是哈達瑪乘積(逐點點乘)。換句話說:借助SFT,原本關於類別的先驗,就轉化為了調整參數信息。
在網絡中是這么實現的:
我們先關注SFT結構。
-
如圖,分割概率圖沒有直接輸入網絡,而是先經過一個淺層CNN學習,我們稱之為condition network。
-
網絡的輸出(conditions)會在整個網絡的每一個中間層共享。在內部,如圖,conditions分別經過2層CNN,得到參數對即可。然后執行仿射變換,完畢。
4.3節實驗發現,直接拼接分割信息圖,效果是很差的。
超分辨率網絡
我們首先看一看分割網絡。
-
LR圖像先經過了雙三次插值升采樣,然后經過分割網絡[31],得到語義分割概率圖。該網絡是獨立訓練的,與我們現在的工作獨立。
-
實驗發現,哪怕經過放縮因子為4的降采樣,分割效果也是不錯的(如圖4)。如果類別未知,那么該目標會落入背景(background)。
整體結構是一個GAN,參見3.2節。
實驗略。