MMKNet: A Multi-Modal Knowledge Network for Predicting Both Seen and Unseen Classes in Medical Imaging

  • Wenqi Xu
  • , Hong Seng Gan*
  • , Shengen Wu
  • , Zimu Wang
  • , Muhammad Hanif Ramlee
  • , Wan Mahani Hafizah
  • *Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Generalized zero-shot learning (ML-GZSL) has demonstrated significant potential in medical diagnostics due to doctors' need to process large volumes of medical images. Vision Transformers (ViTs), due to their Transformer-like structure, are considered to have superior feature-generation capabilities in cross-text-image tasks. BioMedBERT, based on the BERT architecture and domain-specific pre-training for biomedical natural language processing, is considered to have significant label embedding capabilities in cross-text-image tasks. In this paper, we propose MMKNet, a novel method that employs ViTs to construct global and local features of images for visual knowledge from images while using BioMedBERT with prompt tuning for the label embedding to achieve the knowledge from textual embedding in biomedical corpora. To integrate multi-modal information, we design a unique combined decision layer, which outputs similarity scores between images and class labels, providing the predicted classifications. Our method is class-independent during inference, enabling the model to predict unseen classes. Experiments on the NIH-ChestXray14, Kaggle retina, and Multi-Label Retinal Diseases (MuReD) datasets demonstrate that our method outperforms baseline models across multiple performance metrics, which can potentially optimize doctors' workflows by allowing them to focus on diagnosing complex cases, addressing challenges of limited dataset sizes and incomplete disease coverage in the medical imaging domain.

Original languageEnglish
Title of host publicationProceedings of the 2025 28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025
EditorsWeiming Shen, Weiming Shen, Marie-Helene Abel, Nada Matta, Jean-Paul Barthes, Junzhou Luo, Jinghui Zhang, Haibin Zhu, Kunkun Peng
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1599-1604
Number of pages6
Edition2025
ISBN (Electronic)9798331513054
DOIs
Publication statusPublished - 2025
Event28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025 - Compiegne, France
Duration: 5 May 20257 May 2025

Conference

Conference28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025
Country/TerritoryFrance
CityCompiegne
Period5/05/257/05/25

Keywords

  • medical image processing
  • multi-label classification
  • Multi-modality

Fingerprint

Dive into the research topics of 'MMKNet: A Multi-Modal Knowledge Network for Predicting Both Seen and Unseen Classes in Medical Imaging'. Together they form a unique fingerprint.

Cite this