TY - JOUR
T1 - Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects
AU - Tan, Yong Xuan
AU - Lee, Chin Poo
AU - Neo, Mai
AU - Lim, Kian Ming
AU - Lim, Jit Yan
AU - Alqahtani, Ali
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - Text-to-image synthesis is a fascinating area of research that aims to generate images based on textual descriptions. The main goal of this field is to generate images that match the given textual description in terms of both semantic consistency and image realism. While text-to-image synthesis has shown remarkable progress in recent years, it still faces several challenges, mainly related to the level of image realism and semantic consistency. To address these challenges, various approaches have been proposed, which mainly rely on Generative Adversarial Networks (GANs) for optimal performance. This paper provides a review of the existing text-to-image synthesis approaches, which are categorized into four groups: image realism, multiple scene, semantic enhancement, and style transfer. In addition to discussing the existing approaches, this paper also reviews the widely used datasets for text-to-image synthesis, including Oxford-102, CUB-200-2011, and COCO. The evaluation metrics used in this field are also discussed, including Inception Score, Fréchet Inception Distance, Structural Similarity Index, R-precision, Visual-Semantic Similarity, and Semantic Object Accuracy. The paper also offers a compilation of the performance of existing works in the field.
AB - Text-to-image synthesis is a fascinating area of research that aims to generate images based on textual descriptions. The main goal of this field is to generate images that match the given textual description in terms of both semantic consistency and image realism. While text-to-image synthesis has shown remarkable progress in recent years, it still faces several challenges, mainly related to the level of image realism and semantic consistency. To address these challenges, various approaches have been proposed, which mainly rely on Generative Adversarial Networks (GANs) for optimal performance. This paper provides a review of the existing text-to-image synthesis approaches, which are categorized into four groups: image realism, multiple scene, semantic enhancement, and style transfer. In addition to discussing the existing approaches, this paper also reviews the widely used datasets for text-to-image synthesis, including Oxford-102, CUB-200-2011, and COCO. The evaluation metrics used in this field are also discussed, including Inception Score, Fréchet Inception Distance, Structural Similarity Index, R-precision, Visual-Semantic Similarity, and Semantic Object Accuracy. The paper also offers a compilation of the performance of existing works in the field.
KW - GAN
KW - generative adversarial networks
KW - generative model
KW - review
KW - survey
KW - Text-to-image synthesis
UR - http://www.scopus.com/inward/record.url?scp=85168719624&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2023.3306422
DO - 10.1109/ACCESS.2023.3306422
M3 - Article
AN - SCOPUS:85168719624
SN - 2169-3536
VL - 11
SP - 88099
EP - 88115
JO - IEEE Access
JF - IEEE Access
ER -