Projects per year
Abstract
Multimodal sentiment analysis aims to extract sentiment information expressed by users from multimodal data, including linguistic, acoustic, and visual cues. However, the heterogeneity of multimodal data leads to disparities in modal distribution, thereby impacting the model's ability to effectively integrate complementarity and redundancy across modalities. Additionally, existing approaches often merge modalities directly after obtaining their representations, overlooking potential emotional correlations between them. To tackle these challenges, we propose a Multiview Collaborative Perception (MVCP) framework for multimodal sentiment analysis. This framework consists primarily of two modules: Multimodal Disentangled Representation Learning (MDRL) and Cross-Modal Context Association Mining (CMCAM). The MDRL module employs a joint learning layer comprising a common encoder and an exclusive encoder. This layer maps multimodal data to a hypersphere, learning common and exclusive representations for each modality, thus mitigating the semantic gap arising from modal heterogeneity. To further bridge semantic gaps and capture complex inter-modal correlations, the CMCAM module utilizes multiple attention mechanisms to mine cross-modal and contextual sentiment associations, yielding joint representations with rich multimodal semantic interactions. In this stage, the CMCAM module only discovers the correlation information among the common representations in order to maintain the exclusive representations of different modalities. Finally, a multitask learning framework is adopted to achieve parameter sharing between single-modal tasks and improve sentiment prediction performance. Experimental results on the MOSI and MOSEI datasets demonstrate the effectiveness of the proposed method.
| Original language | English |
|---|---|
| Article number | 128940 |
| Journal | Neurocomputing |
| Volume | 617 |
| DOIs | |
| Publication status | Published - 7 Feb 2025 |
Keywords
- Linguistic guided-multihead attention
- Multimodal association mining
- Multimodal fusion
- Multimodal representation learning
- Multimodal sentiment analysis
Fingerprint
Dive into the research topics of 'Multimodal sentiment analysis based on disentangled representation learning and cross-modal-context association mining'. Together they form a unique fingerprint.Projects
- 1 Active
-
Research on Multimodal Robotic Arm Response Technology in Environment-Enabled Scenarios/环境赋使场景下的多模态机械臂响应技术研究
Pan, Y. (PI), Wang, Y. (Team member), Xiang, N. (Team member), Zhang, H. (Team member), Xu, Z. (Team member), Ji, C. (Team member) & Chen, Y. (CoPI)
1/03/25 → 28/02/29
Project: Collaborative Research Project