Abstract
Generalized zero-shot learning (ML-GZSL) has demonstrated significant potential in medical diagnostics due to doctors' need to process large volumes of medical images. Vision Transformers (ViTs), due to their Transformer-like structure, are considered to have superior feature-generation capabilities in cross-text-image tasks. BioMedBERT, based on the BERT architecture and domain-specific pre-training for biomedical natural language processing, is considered to have significant label embedding capabilities in cross-text-image tasks. In this paper, we propose MMKNet, a novel method that employs ViTs to construct global and local features of images for visual knowledge from images while using BioMedBERT with prompt tuning for the label embedding to achieve the knowledge from textual embedding in biomedical corpora. To integrate multi-modal information, we design a unique combined decision layer, which outputs similarity scores between images and class labels, providing the predicted classifications. Our method is class-independent during inference, enabling the model to predict unseen classes. Experiments on the NIH-ChestXray14, Kaggle retina, and Multi-Label Retinal Diseases (MuReD) datasets demonstrate that our method outperforms baseline models across multiple performance metrics, which can potentially optimize doctors' workflows by allowing them to focus on diagnosing complex cases, addressing challenges of limited dataset sizes and incomplete disease coverage in the medical imaging domain.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2025 28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025 |
| Editors | Weiming Shen, Weiming Shen, Marie-Helene Abel, Nada Matta, Jean-Paul Barthes, Junzhou Luo, Jinghui Zhang, Haibin Zhu, Kunkun Peng |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 1599-1604 |
| Number of pages | 6 |
| Edition | 2025 |
| ISBN (Electronic) | 9798331513054 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025 - Compiegne, France Duration: 5 May 2025 → 7 May 2025 |
Conference
| Conference | 28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025 |
|---|---|
| Country/Territory | France |
| City | Compiegne |
| Period | 5/05/25 → 7/05/25 |
Keywords
- medical image processing
- multi-label classification
- Multi-modality
Fingerprint
Dive into the research topics of 'MMKNet: A Multi-Modal Knowledge Network for Predicting Both Seen and Unseen Classes in Medical Imaging'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver