TY - JOUR
T1 - Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering
AU - Liu, Yishu
AU - Chen, Bingzhi
AU - Wang, Shuihua
AU - Lu, Guangming
AU - Zhang, Zheng
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - Medical visual question answering (Medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pre-training paradigms to mitigate the limitation of small-scale data. Nevertheless, most of them still suffer from two challenges that remain for further research: 1) Limited research focuses on distilling representation from a complete modality to guide the representation learning of masked data in other modalities. 2) Multi-modal fusion based on self-attention mechanisms cannot effectively handle the inherent uncertainty and vagueness of information interaction across modalities. To mitigate these issues, in this paper, we propose a novel Deep Fuzzy Multi-teacher Distillation (DFMD) Network for medical visual question answering, which can take advantage of fuzzy logic to model the uncertainties from vison-language representations across modalities in a multi-teacher framework. Specifically, a multi-teacher knowledge distillation (MKD) module is conceived to assist in reconstructing the missing semantics under the supervision signal generated by teachers from the other complete modality, achieving more robust semantic interaction across modalities. Incorporating insights from fuzzy logic theory, we propose a noise-robust encoder called FuzBERT that enables our DFMD model to reduce the imprecision and ambiguity in feature representation during the multi-modal interaction process. To the best of our knowledge, our work is the first attempt to combine fuzzy logic theory with the transformer-based encoder to effectively learn multi-modal representation for medical visual question answering. Experimental results on the VQA-RAD and SLAKE datasets consistently demonstrate the superiority of our proposed DFMD method over state-of-the-art baselines.
AB - Medical visual question answering (Medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pre-training paradigms to mitigate the limitation of small-scale data. Nevertheless, most of them still suffer from two challenges that remain for further research: 1) Limited research focuses on distilling representation from a complete modality to guide the representation learning of masked data in other modalities. 2) Multi-modal fusion based on self-attention mechanisms cannot effectively handle the inherent uncertainty and vagueness of information interaction across modalities. To mitigate these issues, in this paper, we propose a novel Deep Fuzzy Multi-teacher Distillation (DFMD) Network for medical visual question answering, which can take advantage of fuzzy logic to model the uncertainties from vison-language representations across modalities in a multi-teacher framework. Specifically, a multi-teacher knowledge distillation (MKD) module is conceived to assist in reconstructing the missing semantics under the supervision signal generated by teachers from the other complete modality, achieving more robust semantic interaction across modalities. Incorporating insights from fuzzy logic theory, we propose a noise-robust encoder called FuzBERT that enables our DFMD model to reduce the imprecision and ambiguity in feature representation during the multi-modal interaction process. To the best of our knowledge, our work is the first attempt to combine fuzzy logic theory with the transformer-based encoder to effectively learn multi-modal representation for medical visual question answering. Experimental results on the VQA-RAD and SLAKE datasets consistently demonstrate the superiority of our proposed DFMD method over state-of-the-art baselines.
KW - Fuzzy deep learning
KW - fuzzy logic
KW - knowledge distillation
KW - medical visual question answering
UR - http://www.scopus.com/inward/record.url?scp=85195386543&partnerID=8YFLogxK
U2 - 10.1109/TFUZZ.2024.3402086
DO - 10.1109/TFUZZ.2024.3402086
M3 - Article
AN - SCOPUS:85195386543
SN - 1063-6706
SP - 1
EP - 15
JO - IEEE Transactions on Fuzzy Systems
JF - IEEE Transactions on Fuzzy Systems
ER -