Projects per year
Abstract
Abstract: Multimodal sentiment analysis (MSA) aims to discern the emotional information expressed by users in the multimodal data they upload on various social media platforms. In most previous studies, these modalities (audio A, visual V, and text T) were typically treated equally, overlooking the lower representation quality inherent in audio and visual modalities. This oversight often results in inaccurate interaction information when audio or visual modalities are used as the primary input, thereby negatively impacting the model’s sentiment predictions. In this paper, we propose a text-dominant multimodal perception network with cross-modal transformer-based semantic enhancement. The network comprises primarily a text-dominant multimodal perception (TDMP) module and a cross-modal transformer-based semantic enhancement (TSE) module. TDMP leverages the text modality to dominate intermodal interactions, extracting high correlation and differentiation features from each modality, thereby obtaining more accurate representations for each modality. The TSE module uses a transformer architecture to convert the audio and visual modality features into text features. By applying KL divergence constraints, it ensures that the translated modality representations capture as much emotional information as possible while maintaining high similarity to the original text modality representations. This enhances the original text modality semantics while mitigating the negative impact of the input. Extensive experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate the effectiveness of our proposed model.
| Original language | English |
|---|---|
| Article number | 188 |
| Journal | Applied Intelligence |
| Volume | 55 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - Jan 2025 |
Keywords
- Modality translation
- Semantic enhancement
- Text dominance
Fingerprint
Dive into the research topics of 'Text-dominant multimodal perception network for sentiment analysis based on cross-modal semantic enhancements'. Together they form a unique fingerprint.Projects
- 1 Active
-
Research on Multimodal Robotic Arm Response Technology in Environment-Enabled Scenarios/环境赋使场景下的多模态机械臂响应技术研究
Pan, Y. (PI), Wang, Y. (Team member), Xiang, N. (Team member), Zhang, H. (Team member), Xu, Z. (Team member), Ji, C. (Team member) & Chen, Y. (CoPI)
1/03/25 → 28/02/29
Project: Collaborative Research Project