A multi-view and multi-granularity emotional semantic interaction framework combining graph and attention mechanism for multimodal sentiment analysis

  • Zuhe Li
  • , Xiang Guo
  • , Huaiguang Wu
  • , Jun Yu
  • , Haoran Chen
  • , Yifan Gao
  • , Xiaowei Huang
  • , Yushan Pan*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal sentiment analysis aims to distinguish emotional leanings within data by examining information across multiple modalities. The primary responsibility is to effectively harness both the internal and external emotional correlations among modalities while thoroughly extracting implicit emotional cues to accommodate a wide range of semantic contexts. We present MMIF, a Multi-view and Multi-granularity Sentiment Semantic Interaction Framework that tackles these challenges in four concise steps: the Quantum-inspired Temporal Feature Extraction (QTFE) enriches each modality's temporal dynamics with a quantum-structured LSTM; the Graph Neural Networks Enhanced Intra-modal Representation (GEIR) builds modality-specific graphs and taps diverse GNN variants to strengthen intra-modal reasoning; the Inter-modal Deep Interaction Fusion (IDIF) fuses cross-modal cues using a bidirectionally enhanced attention mechanism; the Emotion Representation Distribution Matching (ERDM) refines the final emotion predictions by capturing multi-granularity distributions of sentiment intensity information. Experimental results on the public datasets CMU-MOSI, CMU-MOSEI, and CH-SIMS show that the proposed model outperforms the compared methods in terms of performance, achieving the best accuracy scores of 89.12 %, 86.79 %, and 80.53 %, as well as F1 scores of 89.13 %, 86.80 %, and 80.71 %, respectively, demonstrating a significant performance advantage.

Original languageEnglish
Article number131380
JournalNeurocomputing
Volume654
DOIs
Publication statusPublished - 14 Nov 2025

Keywords

  • Emotion representation decoupling
  • Graph neural networks
  • Multimodal deep fusion
  • Quantum computing

Fingerprint

Dive into the research topics of 'A multi-view and multi-granularity emotional semantic interaction framework combining graph and attention mechanism for multimodal sentiment analysis'. Together they form a unique fingerprint.

Cite this