TY - JOUR
T1 - Multimodal sentiment analysis based on disentangled representation learning and cross-modal-context association mining
AU - Li, Zuhe
AU - Liu, Panbo
AU - Pan, Yushan
AU - Ding, Weiping
AU - Yu, Jun
AU - Chen, Haoran
AU - Liu, Weihua
AU - Luo, Yiming
AU - Wang, Hao
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2025/2/7
Y1 - 2025/2/7
N2 - Multimodal sentiment analysis aims to extract sentiment information expressed by users from multimodal data, including linguistic, acoustic, and visual cues. However, the heterogeneity of multimodal data leads to disparities in modal distribution, thereby impacting the model's ability to effectively integrate complementarity and redundancy across modalities. Additionally, existing approaches often merge modalities directly after obtaining their representations, overlooking potential emotional correlations between them. To tackle these challenges, we propose a Multiview Collaborative Perception (MVCP) framework for multimodal sentiment analysis. This framework consists primarily of two modules: Multimodal Disentangled Representation Learning (MDRL) and Cross-Modal Context Association Mining (CMCAM). The MDRL module employs a joint learning layer comprising a common encoder and an exclusive encoder. This layer maps multimodal data to a hypersphere, learning common and exclusive representations for each modality, thus mitigating the semantic gap arising from modal heterogeneity. To further bridge semantic gaps and capture complex inter-modal correlations, the CMCAM module utilizes multiple attention mechanisms to mine cross-modal and contextual sentiment associations, yielding joint representations with rich multimodal semantic interactions. In this stage, the CMCAM module only discovers the correlation information among the common representations in order to maintain the exclusive representations of different modalities. Finally, a multitask learning framework is adopted to achieve parameter sharing between single-modal tasks and improve sentiment prediction performance. Experimental results on the MOSI and MOSEI datasets demonstrate the effectiveness of the proposed method.
AB - Multimodal sentiment analysis aims to extract sentiment information expressed by users from multimodal data, including linguistic, acoustic, and visual cues. However, the heterogeneity of multimodal data leads to disparities in modal distribution, thereby impacting the model's ability to effectively integrate complementarity and redundancy across modalities. Additionally, existing approaches often merge modalities directly after obtaining their representations, overlooking potential emotional correlations between them. To tackle these challenges, we propose a Multiview Collaborative Perception (MVCP) framework for multimodal sentiment analysis. This framework consists primarily of two modules: Multimodal Disentangled Representation Learning (MDRL) and Cross-Modal Context Association Mining (CMCAM). The MDRL module employs a joint learning layer comprising a common encoder and an exclusive encoder. This layer maps multimodal data to a hypersphere, learning common and exclusive representations for each modality, thus mitigating the semantic gap arising from modal heterogeneity. To further bridge semantic gaps and capture complex inter-modal correlations, the CMCAM module utilizes multiple attention mechanisms to mine cross-modal and contextual sentiment associations, yielding joint representations with rich multimodal semantic interactions. In this stage, the CMCAM module only discovers the correlation information among the common representations in order to maintain the exclusive representations of different modalities. Finally, a multitask learning framework is adopted to achieve parameter sharing between single-modal tasks and improve sentiment prediction performance. Experimental results on the MOSI and MOSEI datasets demonstrate the effectiveness of the proposed method.
KW - Linguistic guided-multihead attention
KW - Multimodal association mining
KW - Multimodal fusion
KW - Multimodal representation learning
KW - Multimodal sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85210365078&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2024.128940
DO - 10.1016/j.neucom.2024.128940
M3 - Article
AN - SCOPUS:85210365078
SN - 0925-2312
VL - 617
JO - Neurocomputing
JF - Neurocomputing
M1 - 128940
ER -