Skip to main navigation Skip to search Skip to main content

Scale-Selectable Global Information and Discrepancy Learning Network for Multimodal Sentiment Analysis

  • Xiaojiang He
  • , Yushan Pan*
  • , Xinfei Guo
  • , Zhijie Xu
  • , Chenguang Yang
  • *Corresponding author for this work
  • University of Liverpool
  • Xi'an Jiaotong-Liverpool University
  • Shanghai Jiao Tong University

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
251 Downloads (Pure)

Abstract

Multimodal sentiment analysis and depression detection are pivotal for advancing human-computer interaction, yet significant challenges remain. First, the limited extraction of global contextual information within individual modalities risks the loss of modal-specific features. Second, existing methods often prioritize unaligned textual interactions, neglecting critical inter-modal discrepancies. To address these issues, we propose the Scale-Selectable Global and Discrepancy Learning Network (SSGDL), an innovative framework that integrates two core modules: the Cross-Shaped Dynamic Scale Attention Module (CSDSA) and the Primary-Secondary modal Discrepancy Learning Module (PS-MDL). The CS-DSA dynamically selects scales and employs cross-shaped attention to capture comprehensive global context and intricate internal correlations, effectively producing a fused modal representation. Meanwhile, the PS-MDL designates the fused modal as primary and utilizes cross-attention mechanisms to learn discrepancy representations between it and other modalities (textual, acoustic, and visual). By leveraging intermodal discrepancies, SSGDL achieves a more nuanced and holistic understanding of emotional content. Extensive experiments on three benchmark multimodal sentiment analysis datasets (MOSI, MOSEI, SIMS) and a depression detection dataset (AVEC2019) demonstrate that SSGDL consistently outperforms state-of-theart approaches, setting a new benchmark for multimodal affective computing.

Original languageEnglish
Pages (from-to)3169-3182
Number of pages14
JournalIEEE Transactions on Affective Computing
Volume16
Issue number4
Early online dateJun 2025
DOIs
Publication statusPublished - Dec 2025

Keywords

  • Multimodal sentiment analysis
  • depression detection
  • inter-modal discrep- ancy learning
  • neuro-scientific theories
  • scale-selectabl global information

Cite this