Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

Zihan Ye, Fuyuan Hu*, Fan Lyu, Linyan Li, Kaizhu Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)


Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multi-modal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.

Original languageEnglish
Pages (from-to)2828-2840
Number of pages13
JournalIEEE Transactions on Multimedia
Publication statusPublished - 2022


  • Zero-shot learning
  • deep learning
  • generative adversarial network
  • representation learning


Dive into the research topics of 'Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning'. Together they form a unique fingerprint.

Cite this