TY - JOUR
T1 - Representation distribution matching and dynamic routing interaction for multimodal sentiment analysis
AU - Li, Zuhe
AU - Huang, Zhenwei
AU - He, Xiaojiang
AU - Yu, Jun
AU - Chen, Haoran
AU - Yang, Chenguang
AU - Pan, Yushan
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/5/12
Y1 - 2025/5/12
N2 - To address the challenges of distribution discrepancies between modalities, underutilization of representations during fusion, and homogenization of fused representations in cross-modal interactions, we introduce a cutting-edge multimodal sentiment analysis (MSA) framework called representation distribution matching interaction to extract and interpret emotional cues from video data. This framework includes a representation distribution matching module that uses an adversarial cyclic translation network. This aligns the representation distributions of nontextual modalities with those of textual modalities, preserving semantic information while reducing distribution gaps. We also developed the dynamic routing interaction module, which combines four distinct components to form a routing interaction space. This setup efficiently uses modality representations for a more effective emotional learning. To combat homogenization, we propose the cross-modal interaction optimization mechanism. It maximizes differences in fused representations and enhances mutual information with target modalities, yielding more discriminative fused representations. Our extensive experiments on the MOSI and MOSEI datasets confirm the effectiveness of our MSA framework.
AB - To address the challenges of distribution discrepancies between modalities, underutilization of representations during fusion, and homogenization of fused representations in cross-modal interactions, we introduce a cutting-edge multimodal sentiment analysis (MSA) framework called representation distribution matching interaction to extract and interpret emotional cues from video data. This framework includes a representation distribution matching module that uses an adversarial cyclic translation network. This aligns the representation distributions of nontextual modalities with those of textual modalities, preserving semantic information while reducing distribution gaps. We also developed the dynamic routing interaction module, which combines four distinct components to form a routing interaction space. This setup efficiently uses modality representations for a more effective emotional learning. To combat homogenization, we propose the cross-modal interaction optimization mechanism. It maximizes differences in fused representations and enhances mutual information with target modalities, yielding more discriminative fused representations. Our extensive experiments on the MOSI and MOSEI datasets confirm the effectiveness of our MSA framework.
KW - Cross-modal interaction optimization
KW - Distribution matching
KW - Multimodal sentiment analysis
KW - Route network
UR - http://www.scopus.com/inward/record.url?scp=105001570158&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2025.113376
DO - 10.1016/j.knosys.2025.113376
M3 - Article
AN - SCOPUS:105001570158
SN - 0950-7051
VL - 316
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 113376
ER -