Scale-Selectable Global Information and Discrepancy Learning Network for Multimodal Sentiment Analysis

Xiaojiang He, Yushan Pan*, Xinfei Guo, Zhijie Xu, Chenguang Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal sentiment analysis and depression detection are pivotal for advancing human-computer interaction, yet significant challenges remain. First, the limited extraction of global contextual information within individual modalities risks the loss of modal-specific features. Second, existing methods often prioritize unaligned textual interactions, neglecting critical inter-modal discrepancies. To address these issues, we propose the Scale-Selectable Global and Discrepancy Learning Network (SSGDL), an innovative framework that integrates two core modules: the Cross-Shaped Dynamic Scale Attention Module (CSDSA) and the Primary-Secondary modal Discrepancy Learning Module (PS-MDL). The CS-DSA dynamically selects scales and employs cross-shaped attention to capture comprehensive global context and intricate internal correlations, effectively producing a fused modal representation. Meanwhile, the PS-MDL designates the fused modal as primary and utilizes cross-attention mechanisms to learn discrepancy representations between it and other modalities (textual, acoustic, and visual). By leveraging intermodal discrepancies, SSGDL achieves a more nuanced and holistic understanding of emotional content. Extensive experiments on three benchmark multimodal sentiment analysis datasets (MOSI, MOSEI, SIMS) and a depression detection dataset (AVEC2019) demonstrate that SSGDL consistently outperforms state-of-theart approaches, setting a new benchmark for multimodal affective computing.

Original languageEnglish
JournalIEEE Transactions on Affective Computing
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • depression detection
  • Inter-modal Discrepancy Learning
  • Multimodal Sentiment Analysis
  • Neuro-scientific theories
  • Scale-Selectabl Global Information

Fingerprint

Dive into the research topics of 'Scale-Selectable Global Information and Discrepancy Learning Network for Multimodal Sentiment Analysis'. Together they form a unique fingerprint.

Cite this