TY - JOUR
T1 - Text-to-image synthesis with self-supervised bi-stage generative adversarial network
AU - Tan, Yong Xuan
AU - Lee, Chin Poo
AU - Neo, Mai
AU - Lim, Kian Ming
AU - Lim, Jit Yan
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/5
Y1 - 2023/5
N2 - Text-to-image synthesis is challenging as generating images that are visually realistic and semantically consistent with the given text description involves multi-modal learning with text and image. To address the challenges, this paper presents a text-to-image synthesis model that utilizes self-supervision and bi-stage image distribution architecture, referred to as the Self-Supervised Bi-Stage Generative Adversarial Network (SSBi-GAN). The self-supervision diversifies the learned representation thus improving the quality of the synthesized images. Besides that, the bi-stage architecture with Residual network enables the generation of larger images with finer visual contents. Not only that, some enhancements including L1 distance, one-sided smoothing and feature matching are incorporated to enhance the visual realism and semantic consistency of the images as well as the training stability of the model. The empirical results on Oxford-102 and CUB datasets corroborate the ability of the proposed SSBi-GAN in generating visually realistic and semantically consistent images.
AB - Text-to-image synthesis is challenging as generating images that are visually realistic and semantically consistent with the given text description involves multi-modal learning with text and image. To address the challenges, this paper presents a text-to-image synthesis model that utilizes self-supervision and bi-stage image distribution architecture, referred to as the Self-Supervised Bi-Stage Generative Adversarial Network (SSBi-GAN). The self-supervision diversifies the learned representation thus improving the quality of the synthesized images. Besides that, the bi-stage architecture with Residual network enables the generation of larger images with finer visual contents. Not only that, some enhancements including L1 distance, one-sided smoothing and feature matching are incorporated to enhance the visual realism and semantic consistency of the images as well as the training stability of the model. The empirical results on Oxford-102 and CUB datasets corroborate the ability of the proposed SSBi-GAN in generating visually realistic and semantically consistent images.
KW - GAN
KW - Generative adversarial network
KW - Self-supervised learning
KW - Text-to-image-synthesis
UR - http://www.scopus.com/inward/record.url?scp=85151670285&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2023.03.023
DO - 10.1016/j.patrec.2023.03.023
M3 - Article
AN - SCOPUS:85151670285
SN - 0167-8655
VL - 169
SP - 43
EP - 49
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -