Iterative facial image inpainting based on an encoder-generator architecture

Creative Commons License

Dogan Y., Keles H.

Neural Computing and Applications, vol.34, no.12, pp.10001-10021, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 34 Issue: 12
  • Publication Date: 2022
  • Doi Number: 10.1007/s00521-022-06987-y
  • Journal Name: Neural Computing and Applications
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
  • Page Numbers: pp.10001-10021
  • Keywords: Facial image inpainting, Image completion, Generative adversarial networks, Deep learning, Convolutional neural networks, ADVERSARIAL NETWORK
  • Hacettepe University Affiliated: No


© 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.Facial image inpainting is a challenging problem as it requires generating new pixels that include semantic information for masked key components in a face, e.g., eyes and nose. Recently, remarkable methods have been proposed in this field. Most of these approaches use encoder–decoder architectures and have different limitations such as allowing unique results for a given image and a particular mask. Alternatively, some optimization-based approaches generate promising results using different masks with generator networks. However, these approaches are computationally more expensive. In this paper, we propose an efficient solution to the facial image painting problem using the Cyclic Reverse Generator (CRG) architecture, which provides an encoder-generator model. We use the encoder to embed a given image to the generator space and incrementally inpaint the masked regions until a plausible image is generated; we trained a discriminator model to assess the quality of the generated images during the iterations and determine the convergence. After the generation process, for the post-processing, we utilize a Unet model that we trained specifically for this task to remedy the artifacts close to the mask boundaries. We empirically observed that even in the absence of important facial features, the encoder model is capable of embedding images in semantically rich regions in the latent space, utilizing the surrounding context in the images. Cultivating the feedback loop between the encoder and generator gradually improves the missing content in the images in an iterative fashion, and only a few iterations are sufficient to generate realistic content. Since the models are not trained for particular mask types, our method allows applying sketch-based inpaintings, using a variety of mask types, and producing multiple and diverse results. We compared our method with the state-of-the-art models both quantitatively and qualitatively, and observed that our method can compete with the other models in all mask types; it is particularly better in images where larger masks are utilized. Our code, dataset and models are available at: