Abstract
The text-to-image synthesis aims to synthesize an image based on a given text description, which is especially useful for applications in image editing, graphic design, etc. The main challenges of text-to-image synthesis are to generate images that are visually realistic and semantically consistent with the given text description. In this paper, we proposed some enhancements to the conditional generative model that is widely used for text-to-image synthesis. The enhancements include text conditioning augmentation, feature matching, and LI distance loss function. The text conditioning augmentation expands the text embedding feature space to improve the semantic consistency of the model. The feature matching motivates the model to synthesize more photo-realistic images and enrich the image content variations. Apart from that, the LI distance loss allows the model to generate images that have high visual resemblance to the real images. The empirical results on the CUB-200-2011 dataset demonstrate that the text-to-image synthesis conditional generative model with the proposed enhancements yield the highest Inception score and Structural Similarity Index.
Original language | English |
---|---|
Pages (from-to) | 1-7 |
Number of pages | 7 |
Journal | IAENG International Journal of Computer Science |
Volume | 49 |
Issue number | 1 |
Publication status | Published - 2022 |
Externally published | Yes |
Keywords
- cGANs
- conditional generative adversarial networks
- GANs
- generative adversarial network
- text-to-image-synthesis