【Paper】Image Inpainting 必讀papers-自己整理


Image Inpainting 必讀papers

要搭建自己腦海里的關於Image Inpainting的論文樹,知道每一篇paper的insight,盡可能多的去理解。

2016年

開山之作《Context-Encoders:Feature Learning by Inpainting》

  • link

  • CVPR,2016——作者: Deepak PathakPhilipp KrahenbuhlJeff DonahueTrevor DarrellAlexei A. Efros

  • We present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. By analogy with auto-encoders, we propose Context Encoders -- a convolutional neural network trained to generate the contents of an arbitrary image region conditioned on its surroundings. In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s). When training context encoders, we have experimented with both a standard pixel-wise reconstruction loss, as well as a reconstruction plus an adversarial loss. The latter produces much sharper results because it can better handle multiple modes in the output. We found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures. We quantitatively demonstrate the effectiveness of our learned features for CNN pre-training on classification, detection, and segmentation tasks. Furthermore, context encoders can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.

  • 通過Encoder與Decoder網絡+對抗生成網絡實現圖像修復功能,這里的Encoder與Decoder網絡都是全卷積網絡,在Encoder的最后一層通過全通道鏈接實現信息傳遞,然后通過五個反卷積層實現修復區域生成,網絡結構如下:

  • 解析:Encoder的輸入是帶有mask的圖片,解碼器輸出的是一張inpainting的圖片。再利用生成對抗網絡的判別器思想去訓練這個輸出的圖片。使用L2 loss和adversarial loss作為損失函數,L2 loss可以使得mask里面的內容被恢復,adversarial loss則使得恢復的內容顯得更加真實。【其實還是一個GAN的結果,只不過生成器用Encoder和Decoder代替了(上下文編解碼器),輸入是原圖片的imageSize,輸出是 inpainting的imageSize。】

  • Insight

    1. Encoder Features到Decoder Features之間通過有一個Channel-wise Fully Connected層,該層使得解碼器中的每個單元可以對整個圖像內容進行推理。如果僅僅使用全連接編解碼器會導致參數爆炸。
    2. 原文:rather,the network is trained for context prediction "from scratch" with randomly initialized weights. 編碼器結構源自AlexNet架構(到512x4x4那一層為止),但不是為了分類,而是用隨機初始化的權重方式訓練上下文的內容信息。【個人理解,就是為了讓中間這一層有一個更好的上下文信息。如果把前面的Encoder去掉,那就是一個隨機向量的Decoder過程。現在加上了前面的Encoder,就是為了增加一個上下文信息。】

    1. 損失函數

      L2 Loss:

      x是輸入的原圖(沒有經過任何處理);

      遮擋矩陣:其size和x的size一致,元素只有0和1兩種值。1表示的是被遮擋,0表示未被遮擋;

      F()里面這一坨:表示的就是 x中被遮擋區域的矩陣,如上圖的 3x64x64 圖片;

      M⊙x-F():表示的是 x原遮擋區域 和 生成的遮擋區域的差值;

      再取一個2次方就是L2 Loss了。

      對抗損失Adversaraial Loss:

      對抗損失就是生成對抗網絡的損失了,都是老朋友了。

      聯合損失:

      最終的聯合損失就是分別給 L2損失和對抗損失設置兩個不同的權重系數。【代碼里讓這兩個權重系數相加等於1。】

      【根據代碼來看,訓練判別器時還是要且只要優化對抗損失;而訓練生成器,即這個上下文編解碼器時要優化聯合損失

  • 運行結果如下:

  • 裁剪圖片中心-矩陣

# imageSize = 256
# overlapPred = 4
#overlapPred——overlapping edges 
w = int(imageSize/4+overlapPred)
w_ = int(imageSize/4+imageSize/2-overlapPred)

# input_cropped是生成器的輸入 img_size是 3*128*128
input_cropped.data[:,0,w:w_,w:w_] = 2*117.0/255.0 - 1.0
input_cropped.data[:,1,w:w_,w:w_] = 2*104.0/255.0 - 1.0
input_cropped.data[:,2,w:w_,w:w_] = 2*123.0/255.0 - 1.0


crop_size = int(imageSize/4)
crop_size_w = int(imageSize/4)+int(imageSize/2)
# real_center_cpu是判別器的輸入 img_size是 3*64*64
real_center_cpu = real_cpu[:,:,crop_size:crop_size_w,crop_size:crop_size_w]
  • 模型結構
# nc = 3
# nef = 64——of encoder filters in first conv layer
# ngf = 64
# nBottleneck = 4000——of dim for bottleneck of encoder
import torch
import torch.nn as nn

class _netG(nn.Module):
    def __init__(self, opt):
        super(_netG, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 128 x 128
            nn.Conv2d(nc,nef,4,2,1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size: (nef) x 64 x 64
            nn.Conv2d(nef,nef,4,2,1, bias=False),
            nn.BatchNorm2d(nef),
            nn.LeakyReLU(0.2, inplace=True),
            # state size: (nef) x 32 x 32
            nn.Conv2d(nef,nef*2,4,2,1, bias=False),
            nn.BatchNorm2d(nef*2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size: (nef*2) x 16 x 16
            nn.Conv2d(nef*2,nef*4,4,2,1, bias=False),
            nn.BatchNorm2d(nef*4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size: (nef*4) x 8 x 8
            nn.Conv2d(nef*4,nef*8,4,2,1, bias=False),
            nn.BatchNorm2d(nef*8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size: (nef*8) x 4 x 4
            nn.Conv2d(nef*8,nBottleneck,4, bias=False),
            # tate size: (nBottleneck) x 1 x 1
            nn.BatchNorm2d(nBottleneck),
            nn.LeakyReLU(0.2, inplace=True),
            # input is Bottleneck, going into a convolution
            nn.ConvTranspose2d(nBottleneck, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )

        def forward(self, input):
            if isinstance(input.data, torch.cuda.FloatTensor) and self.ngpu > 1:
                output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
                else:
                    output = self.main(input)
            return output
  
class _netlocalD(nn.Module):
    def __init__(self, opt):
        super(_netlocalD, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

        def forward(self, input):
            if isinstance(input.data, torch.cuda.FloatTensor) and self.ngpu > 1:
                output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
                else:
                    output = self.main(input)

             return output.view(-1, 1)  
  • 損失函數
# wtl2——0 means do not use else use with this weight
criterion = nn.BCELoss() #交叉熵損失函數
#-------判別器D的損失函數——————————————
errD_real = criterion(output, label)
errD_real.backward()

errD_fake = criterion(output, label)
errD_fake.backward()
errD = errD_real + errD_fake

#-------生成器G的損失函數——————————————
errG_D = criterion(output, label)
wtl2Matrix = real_center_cpu.clone()
wtl2Matrix.data.fill_(wtl2*overlapL2Weight)
x = int(opt.imageSize/2 - opt.overlapPred)
wtl2Matrix.data[:,:,int(opt.overlapPred):x,int(opt.overlapPred):x] = wtl2

# errG_l2 = criterionMSE(fake,real_center) 也可以用MSE的損失函數
errG_l2 = (fake-real_center).pow(2)
errG_l2 = errG_l2 * wtl2Matrix
errG_l2 = errG_l2.mean()

errG = (1-wtl2) * errG_D + wtl2 * errG_l2
errG.backward()
  • 幾個不懂的點:

    1. 分組卷積?overlapPred是什么意思?
    2. nn.Conv2d()參數、nn.BatchNorm2d()功能和參數、nn.ConvTranspose2d()功能和參數

2017年

High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis

Generative Image Inpainting with Contextual Attention

  • link
  • CVPR,2017——作者: Jiahui YuZhe LinJimei YangXiaohui ShenXin LuThomas Huang
  • Recent advances in deep learning have shown exciting promise in filling large holes in natural images with semantically plausible and context aware details, impacting fundamental image manipulation tasks such as object removal. While these learning-based methods are significantly more effective in capturing high-level features than prior techniques, they can only handle very low-resolution inputs due to memory limitations and difficulty in training. Even for slightly larger images, the inpainted regions would appear blurry and unpleasant boundaries become visible. We propose a multi-scale neural patch synthesis approach based on joint optimization of image content and texture constraints, which not only preserves contextual structures but also produces high-frequency details by matching and adapting patches with the most similar mid-layer feature correlations of a deep classification network. We evaluate our method on the ImageNet and Paris Streetview datasets and achieved state-of-the-art inpainting accuracy. We show our approach produces sharper and more coherent results than prior methods, especially for high-resolution images.

Globally and Locally Consistent Image Completion

  • ACM Transaction on Graphics,2017——作者: (早稻田大學)SATOSHI IIZUKA、EDGAR SIMO-SERRA、HIROSHI ISHIKAWA
  • We present a novel approach for image completion that results in images that are both locally and globally consistent. With a fully-convolutional neural network, we can complete images of arbitrary resolutions by fillingin missing regions of any shape. To train this image completion network to be consistent, we use global and local context discriminators that are trained to distinguish real images from completed ones. The global discriminator looks at the entire image to assess if it is coherent as a whole, while the local discriminator looks only at a small area centered at the completed region to ensure the local consistency of the generated patches. The image completion network is then trained to fool the both context discriminator networks, which requires it to generate images that are indistinguishable from real ones with regard to overall consistency as well as in details. We show that our approach can be used to complete a wide variety of scenes. Furthermore, in contrast with the patch-based approaches such as PatchMatch, our approach can generate fragments that do not appear elsewhere in the image, which allows us to naturally complete the images of objects with familiar and highly specifc structures, such as faces.

2018年

Contextual-based Image Inpainting: Infer, Match, and Translate

  • link
  • ECCV,2018——作者: Yuhang SongChao YangZhe LinXiaofeng LiuQin HuangHao Li
  • We study the task of image inpainting, which is to fill in the missing region of an incomplete image with plausible contents. To this end, we propose a learning-based approach to generate visually coherent completion given a high-resolution image with missing components. In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we divide the task into inference and translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide the propagation of local textures from the boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions in much smaller space and hence easier to train. We evaluate our method on several public datasets and show that we generate results of better visual quality than previous state-of-the-art methods.

Image Inpainting for Irregular Holes Using Partial Convolutions

  • link
  • ECCV,2018——作者: Guilin LiuFitsum A. RedaKevin J. ShihTing-Chun WangAndrew Tao
  • Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Post-processing is usually used to reduce such artifacts, but are expensive and may fail. We propose the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels. We further include a mechanism to automatically generate an updated mask for the next layer as part of the forward pass. Our model outperforms other methods for irregular masks. We show qualitative and quantitative comparisons with other methods to validate our approach.

2019年

SinGAN: Learning a Generative Model from a Single Natural Image

  • link

  • ICCV,2019——作者:Tamar Rott Shaham,Tali Dekel, Tomer Michaeli

  • We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.

Free-Form Image Inpainting with Gated Convolution

  • link

  • ICCV,2019——作者: Jiahui YuZhe LinJimei YangXiaohui ShenXin LuThomas Huang

  • We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting

Coherent Semantic Attention for Image Inpainting

  • link
  • ICCV,2019——作者: Hongyu LiuBin JiangYi XiaoChao Yang
  • The latest deep learning-based approaches have shown promising results for the challenging task of inpainting missing regions of an image. However, the existing methods often generate contents with blurry textures and distorted structures due to the discontinuity of the local pixels. From a semantic-level perspective, the local pixel discontinuity is mainly because these methods ignore the semantic relevance and feature continuity of hole regions. To handle this problem, we investigate the human behavior in repairing pictures and propose a fined deep generative model-based approach with a novel coherent semantic attention (CSA) layer, which can not only preserve contextual structure but also make more effective predictions of missing parts by modeling the semantic relevance between the holes features. The task is divided into rough, refinement as two steps and model each step with a neural network under the U-Net architecture, where the CSA layer is embedded into the encoder of refinement step. To stabilize the network training process and promote the CSA layer to learn more effective parameters, we propose a consistency loss to enforce the both the CSA layer and the corresponding layer of the CSA in decoder to be close to the VGG feature layer of a ground truth image simultaneously. The experiments on CelebA, Places2, and Paris StreetView datasets have validated the effectiveness of our proposed methods in image inpainting tasks and can obtain images with a higher quality as compared with the existing state-of-the-art approaches.

EdgeConnect Generative Image Inpainting with Adversarial Edge Learning

  • link
  • 2019,作者: Kamyar NazeriEric NgTony JosephFaisal Z. QureshiMehran Ebrahimi
  • Over the last few years, deep learning techniques have yielded significant improvements in image inpainting. However, many of these techniques fail to reconstruct reasonable structures as they are commonly over-smoothed and/or blurry. This paper develops a new approach for image inpainting that does a better job of reproducing filled regions exhibiting fine details. We propose a two-stage adversarial model EdgeConnect that comprises of an edge generator followed by an image completion network. The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively. Code and models available at: https://github.com/knazeri/edge-connect

2020年

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

  • link
  • ECCV,2020——作者:Hongyu LiuBin JiangYibing SongWei HuangChao Yang
  • 論文鏈接 code
  • Deep encoder-decoder based CNNs have advanced image inpainting methods for hole filling. While existing methods recover structures and textures step-by-step in the hole regions, they typically use two encoder-decoders for separate recovery. The CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole. The insufficient utilization of these encoder features limit the performance of recovering both structures and textures. In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively. The deep layer features are sent to a structure branch and the shallow layer features are sent to a texture branch. In each branch, we fill holes in multiple scales of the CNN features. The filled CNN features from both branches are concatenated and then equalized. During feature equalization, we reweigh channel attentions first and propose a bilateral propagation activation function to enable spatial equalization. To this end, the filled CNN features of structure and texture mutually benefit each other to represent image content at all feature levels. We use the equalized feature to supplement decoder features for output image generation through skip connections. Experiments on the benchmark datasets show the proposed method is effective to recover structures and textures and performs favorably against state-of-the-art approaches.

Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

  • link
  • CVPR,2020——作者: Zili Yi,Qiang Tang, Shekoofeh Azizi,Daesik Jang, Zhan Xu
  • Recently data-driven image inpainting methods have made inspiring progress, impacting fundamental image editing tasks such as object removal and damaged image repairing. These methods are more effective than classic approaches, however, due to memory limitations they can only handle low-resolution inputs, typically smaller than 1K. Meanwhile, the resolution of photos captured with mobile devices increases up to 8K. Naive up-sampling of the low-resolution inpainted result can merely yield a large yet blurry result. Whereas, adding a high-frequency residual image onto the large blurry image can generate a sharp result, rich in details and textures. Motivated by this, we propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches, thus only requiring a low-resolution prediction from the network. Since convolutional layers of the neural network only need to operate on low-resolution inputs and outputs, the cost of memory and computing power is thus well suppressed. Moreover, the need for high-resolution training datasets is alleviated. In our experiments, we train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality. Our model can inpaint images as large as 8K with considerable hole sizes, which is intractable with previous learning-based approaches. We further elaborate on the light-weight design of the network architecture, achieving real-time performance on 2K images on a GTX 1080 Ti GPU. Codes are available at: Atlas200dk/sample-imageinpainting-HiFill

Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE

  • link
  • CVPR,2021——作者: Jialun PengDong LiuSongcen XuHouqiang Li
  • 論文鏈接 code
  • Given an incomplete image without additional constraint, image inpainting natively allows for multiple solutions as long as they appear plausible. Recently, multiplesolution inpainting methods have been proposed and shown the potential of generating diverse results. However, these methods have difficulty in ensuring the quality of each solution, e.g. they produce distorted structure and/or blurry texture. We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture. The proposed model is inspired by the hierarchical vector quantized variational auto-encoder (VQ-VAE), whose hierarchical architecture isentangles structural and textural information. In addition, the vector quantization in VQVAE enables autoregressive modeling of the discrete distribution over the structural information. Sampling from the distribution can easily generate diverse and high-quality structures, making up the first stage of our model. In the second stage, we propose a structural attention module inside the texture generation network, where the module utilizes the structural information to capture distant correlations. We further reuse the VQ-VAE to calculate two feature losses, which help improve structure coherence and texture realism, respectively. Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images. Code and models are available at: https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.

2021年

Image Inpainting with External-internal Learning and Monochromic Bottleneck

  • link
  • CVPR,2021——作者:Tengfei WangHao OuyangQifeng Chen
  • 論文鏈接 code
  • Although recent inpainting approaches have demonstrated significant improvements with deep neural networks, they still suffer from artifacts such as blunt structures and abrupt colors when filling in the missing regions. To address these issues, we propose an external-internal inpainting scheme with a monochromic bottleneck that helps image inpainting models remove these artifacts. In the external learning stage, we reconstruct missing structures and details in the monochromic space to reduce the learning dimension. In the internal learning stage, we propose a novel internal color propagation method with progressive learning strategies for consistent color restoration. Extensive experiments demonstrate that our proposed scheme helps image inpainting models produce more structure-preserved and visually compelling results.

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

  • link
  • CVPR,2021——作者:Hongyu LiuZiyu Wan,Wei Huang,Yibing Song, Xintong Han, Jing Liao
  • We propose PD-GAN, a probabilistic diverse GAN for image inpainting. Given an input image with arbitrary hole regions, PD-GAN produces multiple inpainting results with diverse and visually realistic content. Our PD-GAN is built upon a vanilla GAN which generates images based on random noise. During image generation, we modulate deep features of input random noise from coarse-to-fine by injecting an initially restored image and the hole regions in multiple scales. We argue that during hole filling, the pixels near the hole boundary should be more deterministic (i.e., with higher probability trusting the context and initially restored image to create natural inpainting boundary), while those pixels lie in the center of the hole should enjoy more degrees of freedom (i.e., more likely to depend on the random noise for enhancing diversity). To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information. SPDNorm dynamically balances the realism and diversity inside the hole region, making the generated content more diverse towards the hole center and resemble neighboring image content more towards the hole boundary. Meanwhile, we propose a perceptual diversity loss to further empower PD-GAN for diverse content generation. Experiments on benchmark datasets including CelebA-HQ, Places2 and Paris Street View indicate that PD-GAN is effective for diverse and visually realistic image restoration.

High-Fidelity Pluralistic Image Completion with Transformers

  • link
  • 2021——作者: Ziyu WanJingbo ZhangDongdong ChenJing Liao
  • Image completion has made tremendous progress with convolutional neural networks (CNNs), because of their powerful texture modeling capacity. However, due to some inherent properties (e.g., local inductive prior, spatial-invariant kernels), CNNs do not perform well in understanding global structures or naturally support pluralistic completion. Recently, transformers demonstrate their power in modeling the long-term relationship and generating diverse results, but their computation complexity is quadratic to input length, thus hampering the application in processing high-resolution images. This paper brings the best of both worlds to pluralistic image completion: appearance prior reconstruction with transformer and texture replenishment with CNN. The former transformer recovers pluralistic coherent structures together with some coarse textures, while the latter CNN enhances the local texture details of coarse priors guided by the high-resolution masked images. The proposed method vastly outperforms state-of-the-art methods in terms of three aspects: 1) large performance boost on image fidelity even compared to deterministic completion methods; 2) better diversity and higher fidelity for pluralistic completion; 3) exceptional generalization ability on large masks and generic dataset, like ImageNet.

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

  • link
  • ICLR,2021——作者: Shengyu ZhaoJonathan CuiYilun ShengYue DongXiao LiangEric I ChangYan Xu
  • Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation. Code is available at https://github.com/zsyzzsoft/co-mod-gan.

Image inpainting綜述

Image inpainting: A review

  • link

  • 2019,作者:Omar ElharroussNoor AlmaadeedSomaya Al-MaadeedYounes Akbari

  • Although image inpainting, or the art of repairing the old and deteriorated images, has been around for many years, it has gained even more popularity because of the recent development in image processing techniques. With the improvement of image processing tools and the flexibility of digital image editing, automatic image inpainting has found important applications in computer vision and has also become an important and challenging topic of research in image processing. This paper is a brief review of the existing image inpainting approaches we first present a global vision on the existing methods for image inpainting. We attempt to collect most of the existing approaches and classify them into three categories, namely, sequential-based, CNN-based and GAN-based methods. In addition, for each category, a list of methods for the different types of distortion on the images is presented. Furthermore, collect a list of the available datasets and discuss these in our paper. This is a contribution for digital image inpainting researchers trying to look for the available datasets because there is a lack of datasets available for image inpainting. As the final step in this overview, we present the results of real evaluations of the three categories of image inpainting methods performed on the datasets used, for the different types of image distortion. In the end, we also present the evaluations metrics and discuss the performance of these methods in terms of these metrics. This overview can be used as a reference for image inpainting researchers, and it can also facilitate the comparison of the methods as well as the datasets used. The main contribution of this paper is the presentation of the three categories of image inpainting methods along with a list of available datasets that the researchers can use to evaluate their proposed methodology against.

這是一條分割線


Generative Image Inpainting with Contextual Attention

Jiahui Yu, Zhe L. Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang less
Published 2018
Computer Science
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

  • Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feedforward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting.
  • 最近基於深度學習的方法在修復圖像中大型缺失區域的挑戰性任務中顯示出有希望的結果。這些方法可以生成視覺上合理的圖像結構和紋理,但通常會產生與周圍區域不一致的扭曲結構或模糊紋理。這主要是由於卷積神經網絡在從遙遠的空間位置顯式借用或復制信息方面效率低下。另一方面,當需要從周圍區域借用紋理時,傳統的紋理和補丁合成方法特別合適。受這些觀察的啟發,我們提出了一種新的基於深度生成模型的方法,該方法不僅可以合成新穎的圖像結構,還可以在網絡訓練期間明確利用周圍的圖像特征作為參考,以做出更好的預測。該模型是一個前饋、全卷積神經網絡,可以在測試期間處理任意位置的多個孔和可變大小的圖像。在包括人臉(CelebA、CelebA-HQ)、紋理(DTD)和自然圖像(ImageNet、Places2)在內的多個數據集上的實驗表明,我們提出的方法比現有方法產生了更高質量的修復結果。代碼、演示和模型可在:https://github.com/JiahuiYu/generative_inpainting


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM