TY - JOUR
T1 - Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation
AU - Tan, Zhaorui
AU - Yang, Xi
AU - Ye, Zihan
AU - Wang, Qiu-Feng
AU - Yan, Yuyao
AU - Nguyen, Anh
AU - Huang, Kaizhu
PY - 2023
Y1 - 2023
N2 - Generating high-quality images from text remains a challenge in visual-language understanding, with text-image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (S S D), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance
AB - Generating high-quality images from text remains a challenge in visual-language understanding, with text-image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (S S D), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance
KW - text-to-image generation
KW - text-image consistency metric
M3 - Article
SN - 0031-3203
JO - Pattern Recognition
JF - Pattern Recognition
ER -