Enhanced Text-to-Image Synthesis With Self-Supervision

Yong Xuan Tan; Chin Poo Lee; Mai Neo; Kian Ming Lim; Jit Yan Lim

doi:10.1109/ACCESS.2023.3268869

Enhanced Text-to-Image Synthesis With Self-Supervision

Yong Xuan Tan, Chin Poo Lee^*, Mai Neo, Kian Ming Lim, Jit Yan Lim

^*Corresponding author for this work

Multimedia University

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.

Original language	English
Pages (from-to)	39508-39519
Number of pages	12
Journal	IEEE Access
Volume	11
DOIs	https://doi.org/10.1109/ACCESS.2023.3268869
Publication status	Published - 2023
Externally published	Yes

Keywords

GAN
generative adversarial networks
generative model
self-supervised learning
Text-to-image synthesis

Access to Document

10.1109/ACCESS.2023.3268869

Cite this

@article{9efbee8e18874b619c4cd41405d0ece8,

title = "Enhanced Text-to-Image Synthesis With Self-Supervision",

abstract = "The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.",

keywords = "GAN, generative adversarial networks, generative model, self-supervised learning, Text-to-image synthesis",

author = "Tan, {Yong Xuan} and Lee, {Chin Poo} and Mai Neo and Lim, {Kian Ming} and Lim, {Jit Yan}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2023",

doi = "10.1109/ACCESS.2023.3268869",

language = "English",

volume = "11",

pages = "39508--39519",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Enhanced Text-to-Image Synthesis With Self-Supervision

AU - Tan, Yong Xuan

AU - Lee, Chin Poo

AU - Neo, Mai

AU - Lim, Kian Ming

AU - Lim, Jit Yan

PY - 2023

Y1 - 2023

N2 - The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.

AB - The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.

KW - GAN

KW - generative adversarial networks

KW - generative model

KW - self-supervised learning

KW - Text-to-image synthesis

UR - http://www.scopus.com/inward/record.url?scp=85153801523&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2023.3268869

DO - 10.1109/ACCESS.2023.3268869

M3 - Article

AN - SCOPUS:85153801523

SN - 2169-3536

VL - 11

SP - 39508

EP - 39519

JO - IEEE Access

JF - IEEE Access

ER -

Enhanced Text-to-Image Synthesis With Self-Supervision

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this