Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks

Yong Xuan Tan*, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan Lim

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

The text-to-image synthesis aims to synthesize an image based on a given text description, which is especially useful for applications in image editing, graphic design, etc. The main challenges of text-to-image synthesis are to generate images that are visually realistic and semantically consistent with the given text description. In this paper, we proposed some enhancements to the conditional generative model that is widely used for text-to-image synthesis. The enhancements include text conditioning augmentation, feature matching, and LI distance loss function. The text conditioning augmentation expands the text embedding feature space to improve the semantic consistency of the model. The feature matching motivates the model to synthesize more photo-realistic images and enrich the image content variations. Apart from that, the LI distance loss allows the model to generate images that have high visual resemblance to the real images. The empirical results on the CUB-200-2011 dataset demonstrate that the text-to-image synthesis conditional generative model with the proposed enhancements yield the highest Inception score and Structural Similarity Index.

Original languageEnglish
Pages (from-to)1-7
Number of pages7
JournalIAENG International Journal of Computer Science
Volume49
Issue number1
Publication statusPublished - 2022
Externally publishedYes

Keywords

  • cGANs
  • conditional generative adversarial networks
  • GANs
  • generative adversarial network
  • text-to-image-synthesis

Fingerprint

Dive into the research topics of 'Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks'. Together they form a unique fingerprint.

Cite this