Projects per year
Abstract
Multimodal sentiment analysis has garnered widespread attention due to its applications in fields such as human–robot interaction, offering a 10% to 20% accuracy(binary) improvement over unimodal sentiment analysis. However, existing methods still face significant challenges: (1) insufficient utilization of textual information, which impacts the effectiveness of modality fusion and correlation mining; (2) Excessive focus on modality fusion, with a lack of in-depth exploration of the correlations between individual modalities; and (3) the absence of unimodal labels in most multimodal sentiment analysis datasets, leading to challenges in co-learning scenarios. To address these issues, we propose a text-guided deep correlation mining and self-learning feature fusion framework using a multi-task learning strategy. This framework divides sentiment analysis into a multimodal task and three unimodal tasks (linguistic, acoustic, and visual). For unimodal tasks, we designed the Text-Guided Deep Information Correlation Mining Module (TUDCM), which fully explores the correlations between modalities under the guidance of textual information. For the multimodal task, we introduce a Self-Learning Text-Guided Multimodal Fusion Attention (SLTG-Attention) mechanism to enhance the role of textual information and adaptively learn relationships between modalities for efficient fusion. Additionally, we design a Multi-Distance Label Generation Module (MDLGM) to generate more accurate unimodal labels for co-learning scenarios. Extensive experiments on the MOSI, MOSEI, and SIMS datasets demonstrate that our framework significantly outperforms existing methods, achieving an approximate 1% improvement in accuracy. On the MOSI dataset, our method achieves 0.672 MAE, 0.816 correlation, 86.46% accuracy(binary), and 86.52% F1, with similar outstanding results observed on other datasets.
Original language | English |
---|---|
Article number | 113249 |
Journal | Knowledge-Based Systems |
Volume | 315 |
DOIs | |
Publication status | Published - 22 Apr 2025 |
Keywords
- Multi-Distance Label Generation
- Multimodal sentiment analysis
- Self-supervised multimodal fusion
- Text-guided correlation mining
Fingerprint
Dive into the research topics of 'Text-guided deep correlation mining and self-learning feature fusion framework for multimodal sentiment analysis'. Together they form a unique fingerprint.Projects
- 4 Active