TY - JOUR
T1 - Text-to-image synthesis with self-supervised learning
AU - Tan, Yong Xuan
AU - Lee, Chin Poo
AU - Neo, Mai
AU - Lim, Kian Ming
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/5
Y1 - 2022/5
N2 - Text-to-image synthesis extracts the meaning from the text description and converts it into an image correspondingly. Text-to-image synthesis is widely leveraged in many applications, such as graphic design, image editing, etc. Text-to-image synthesis approaches are mainly built on the basis of generative adversarial networks. One of the main challenges in text-to-image synthesis is to generate images that are visually realistic. Not only that, the text-to-image synthesis model is inherently susceptible to overconfidence and training instability issues. To address these challenges, this paper proposes a self-supervised text-to-image synthesis with some enhancements, including self-supervised learning, feature matching, L1 distance loss, and one-sided label smoothing. The self-supervised learning offers more image variations thus improving the classification power of the discriminator. The feature matching and L1 distance functions motivate the generator to synthesize images that are visually more similar to the real images based on the given text description. The one-sided label smoothing adds a penalty value when the discriminator makes a correct classification to alleviate the overconfidence problem and to improve the training stability. The performance of the proposed self-supervised text-to-image synthesis is evaluated on the Oxford-102 and CUB datasets. The empirical results demonstrate that the proposed self-supervised text-to-image synthesis generates images with richer image content diversity, more visually realistic, and more semantically consistent with the given text description. The proposed self-supervised text-to-image synthesis also outshines the methods in comparison in terms of the inception score and Structural Similarity Index.
AB - Text-to-image synthesis extracts the meaning from the text description and converts it into an image correspondingly. Text-to-image synthesis is widely leveraged in many applications, such as graphic design, image editing, etc. Text-to-image synthesis approaches are mainly built on the basis of generative adversarial networks. One of the main challenges in text-to-image synthesis is to generate images that are visually realistic. Not only that, the text-to-image synthesis model is inherently susceptible to overconfidence and training instability issues. To address these challenges, this paper proposes a self-supervised text-to-image synthesis with some enhancements, including self-supervised learning, feature matching, L1 distance loss, and one-sided label smoothing. The self-supervised learning offers more image variations thus improving the classification power of the discriminator. The feature matching and L1 distance functions motivate the generator to synthesize images that are visually more similar to the real images based on the given text description. The one-sided label smoothing adds a penalty value when the discriminator makes a correct classification to alleviate the overconfidence problem and to improve the training stability. The performance of the proposed self-supervised text-to-image synthesis is evaluated on the Oxford-102 and CUB datasets. The empirical results demonstrate that the proposed self-supervised text-to-image synthesis generates images with richer image content diversity, more visually realistic, and more semantically consistent with the given text description. The proposed self-supervised text-to-image synthesis also outshines the methods in comparison in terms of the inception score and Structural Similarity Index.
KW - Generative adversarial network
KW - Self-supervised learning
KW - Text-to-image-synthesis
UR - http://www.scopus.com/inward/record.url?scp=85128379543&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2022.04.010
DO - 10.1016/j.patrec.2022.04.010
M3 - Article
AN - SCOPUS:85128379543
SN - 0167-8655
VL - 157
SP - 119
EP - 126
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -