Representation distribution matching and dynamic routing interaction for multimodal sentiment analysis

Zuhe Li; Zhenwei Huang; Xiaojiang He; Jun Yu; Haoran Chen; Chenguang Yang; Yushan Pan

doi:10.1016/j.knosys.2025.113376

Representation distribution matching and dynamic routing interaction for multimodal sentiment analysis

Zuhe Li, Zhenwei Huang, Xiaojiang He, Jun Yu, Haoran Chen, Chenguang Yang, Yushan Pan^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

To address the challenges of distribution discrepancies between modalities, underutilization of representations during fusion, and homogenization of fused representations in cross-modal interactions, we introduce a cutting-edge multimodal sentiment analysis (MSA) framework called representation distribution matching interaction to extract and interpret emotional cues from video data. This framework includes a representation distribution matching module that uses an adversarial cyclic translation network. This aligns the representation distributions of nontextual modalities with those of textual modalities, preserving semantic information while reducing distribution gaps. We also developed the dynamic routing interaction module, which combines four distinct components to form a routing interaction space. This setup efficiently uses modality representations for a more effective emotional learning. To combat homogenization, we propose the cross-modal interaction optimization mechanism. It maximizes differences in fused representations and enhances mutual information with target modalities, yielding more discriminative fused representations. Our extensive experiments on the MOSI and MOSEI datasets confirm the effectiveness of our MSA framework.

Original language	English
Article number	113376
Journal	Knowledge-Based Systems
Volume	316
DOIs	https://doi.org/10.1016/j.knosys.2025.113376
Publication status	Published - 12 May 2025

Keywords

Cross-modal interaction optimization
Distribution matching
Multimodal sentiment analysis
Route network

Access to Document

10.1016/j.knosys.2025.113376

Cite this

@article{ee4cf02e14994e0ebef27f33b01ba38f,

title = "Representation distribution matching and dynamic routing interaction for multimodal sentiment analysis",

abstract = "To address the challenges of distribution discrepancies between modalities, underutilization of representations during fusion, and homogenization of fused representations in cross-modal interactions, we introduce a cutting-edge multimodal sentiment analysis (MSA) framework called representation distribution matching interaction to extract and interpret emotional cues from video data. This framework includes a representation distribution matching module that uses an adversarial cyclic translation network. This aligns the representation distributions of nontextual modalities with those of textual modalities, preserving semantic information while reducing distribution gaps. We also developed the dynamic routing interaction module, which combines four distinct components to form a routing interaction space. This setup efficiently uses modality representations for a more effective emotional learning. To combat homogenization, we propose the cross-modal interaction optimization mechanism. It maximizes differences in fused representations and enhances mutual information with target modalities, yielding more discriminative fused representations. Our extensive experiments on the MOSI and MOSEI datasets confirm the effectiveness of our MSA framework.",

keywords = "Cross-modal interaction optimization, Distribution matching, Multimodal sentiment analysis, Route network",

author = "Zuhe Li and Zhenwei Huang and Xiaojiang He and Jun Yu and Haoran Chen and Chenguang Yang and Yushan Pan",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = may,

day = "12",

doi = "10.1016/j.knosys.2025.113376",

language = "English",

volume = "316",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier",

}

TY - JOUR

T1 - Representation distribution matching and dynamic routing interaction for multimodal sentiment analysis

AU - Li, Zuhe

AU - Huang, Zhenwei

AU - He, Xiaojiang

AU - Yu, Jun

AU - Chen, Haoran

AU - Yang, Chenguang

AU - Pan, Yushan

PY - 2025/5/12

Y1 - 2025/5/12

N2 - To address the challenges of distribution discrepancies between modalities, underutilization of representations during fusion, and homogenization of fused representations in cross-modal interactions, we introduce a cutting-edge multimodal sentiment analysis (MSA) framework called representation distribution matching interaction to extract and interpret emotional cues from video data. This framework includes a representation distribution matching module that uses an adversarial cyclic translation network. This aligns the representation distributions of nontextual modalities with those of textual modalities, preserving semantic information while reducing distribution gaps. We also developed the dynamic routing interaction module, which combines four distinct components to form a routing interaction space. This setup efficiently uses modality representations for a more effective emotional learning. To combat homogenization, we propose the cross-modal interaction optimization mechanism. It maximizes differences in fused representations and enhances mutual information with target modalities, yielding more discriminative fused representations. Our extensive experiments on the MOSI and MOSEI datasets confirm the effectiveness of our MSA framework.

AB - To address the challenges of distribution discrepancies between modalities, underutilization of representations during fusion, and homogenization of fused representations in cross-modal interactions, we introduce a cutting-edge multimodal sentiment analysis (MSA) framework called representation distribution matching interaction to extract and interpret emotional cues from video data. This framework includes a representation distribution matching module that uses an adversarial cyclic translation network. This aligns the representation distributions of nontextual modalities with those of textual modalities, preserving semantic information while reducing distribution gaps. We also developed the dynamic routing interaction module, which combines four distinct components to form a routing interaction space. This setup efficiently uses modality representations for a more effective emotional learning. To combat homogenization, we propose the cross-modal interaction optimization mechanism. It maximizes differences in fused representations and enhances mutual information with target modalities, yielding more discriminative fused representations. Our extensive experiments on the MOSI and MOSEI datasets confirm the effectiveness of our MSA framework.

KW - Cross-modal interaction optimization

KW - Distribution matching

KW - Multimodal sentiment analysis

KW - Route network

UR - http://www.scopus.com/inward/record.url?scp=105001570158&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2025.113376

DO - 10.1016/j.knosys.2025.113376

M3 - Article

AN - SCOPUS:105001570158

SN - 0950-7051

VL - 316

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 113376

ER -

Representation distribution matching and dynamic routing interaction for multimodal sentiment analysis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this