TY - JOUR
T1 - Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning
AU - Ye, Zihan
AU - Hu, Fuyuan
AU - Lyu, Fan
AU - Li, Linyan
AU - Huang, Kaizhu
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multi-modal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
AB - Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multi-modal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
KW - Zero-shot learning
KW - deep learning
KW - generative adversarial network
KW - representation learning
UR - http://www.scopus.com/inward/record.url?scp=85112194488&partnerID=8YFLogxK
U2 - 10.1109/TMM.2021.3089017
DO - 10.1109/TMM.2021.3089017
M3 - Article
AN - SCOPUS:85112194488
SN - 1520-9210
VL - 24
SP - 2828
EP - 2840
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -