Text-to-image synthesis with self-supervised learning

Yong Xuan Tan*, Chin Poo Lee, Mai Neo, Kian Ming Lim

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)

Abstract

Text-to-image synthesis extracts the meaning from the text description and converts it into an image correspondingly. Text-to-image synthesis is widely leveraged in many applications, such as graphic design, image editing, etc. Text-to-image synthesis approaches are mainly built on the basis of generative adversarial networks. One of the main challenges in text-to-image synthesis is to generate images that are visually realistic. Not only that, the text-to-image synthesis model is inherently susceptible to overconfidence and training instability issues. To address these challenges, this paper proposes a self-supervised text-to-image synthesis with some enhancements, including self-supervised learning, feature matching, L1 distance loss, and one-sided label smoothing. The self-supervised learning offers more image variations thus improving the classification power of the discriminator. The feature matching and L1 distance functions motivate the generator to synthesize images that are visually more similar to the real images based on the given text description. The one-sided label smoothing adds a penalty value when the discriminator makes a correct classification to alleviate the overconfidence problem and to improve the training stability. The performance of the proposed self-supervised text-to-image synthesis is evaluated on the Oxford-102 and CUB datasets. The empirical results demonstrate that the proposed self-supervised text-to-image synthesis generates images with richer image content diversity, more visually realistic, and more semantically consistent with the given text description. The proposed self-supervised text-to-image synthesis also outshines the methods in comparison in terms of the inception score and Structural Similarity Index.

Original languageEnglish
Pages (from-to)119-126
Number of pages8
JournalPattern Recognition Letters
Volume157
DOIs
Publication statusPublished - May 2022
Externally publishedYes

Keywords

  • Generative adversarial network
  • Self-supervised learning
  • Text-to-image-synthesis

Fingerprint

Dive into the research topics of 'Text-to-image synthesis with self-supervised learning'. Together they form a unique fingerprint.

Cite this