Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis

Zuhe Li; Zhenwei Huang; Yushan Pan; Jun Yu; Weihua Liu; Haoran Chen; Yiming Luo; Di Wu; Hao Wang

doi:10.1016/j.eswa.2024.124236

Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis

Zuhe Li, Zhenwei Huang, Yushan Pan^*, Jun Yu, Weihua Liu, Haoran Chen, Yiming Luo, Di Wu, Hao Wang

^*Corresponding author for this work

Department of Computing

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

Multimodal sentiment analysis aims to extract sentiment cues from various modalities, such as textual, acoustic, and visual data, and manipulate them to determine the inherent sentiment polarity in the data. Despite significant achievements in multimodal sentiment analysis, challenges persist in addressing noise features in modal representations, eliminating substantial gaps in sentiment information among modal representations, and exploring contextual information that expresses different sentiments between modalities. To tackle these challenges, our paper proposes a new Multimodal Sentiment Analysis (MSA) framework. Firstly, we introduce the Hierarchical Denoising Representation Disentanglement module (HDRD), which employs hierarchical disentanglement techniques. This ensures the extraction of both common and private sentiment information while eliminating interference noise from modal representations. Furthermore, to address the uneven distribution of sentiment information among modalities, our Inter-Modal Representation Enhancement module (IMRE) enhances non-textual representations by extracting sentiment information related to non-textual representations from textual representations. Next, we introduce a new interaction mechanism, the Dual-Channel Cross-Modal Context Interaction module (DCCMCI). This module not only mines correlated contextual sentiment information within modalities but also explores positive and negative correlation contextual sentiment information between modalities. We conducted extensive experiments on two benchmark datasets, MOSI and MOSEI, and the results indicate that our proposed method offers state-of-the-art approaches.

Original language	English
Article number	124236
Journal	Expert Systems with Applications
Volume	252
DOIs	https://doi.org/10.1016/j.eswa.2024.124236
Publication status	Published - 15 Oct 2024

Keywords

Cross-modal context interaction
Hierarchical disentanglement
Inter-modal enhancement
Multimodal sentiment analysis

Access to Document

10.1016/j.eswa.2024.124236

Cite this

@article{f9597512df084e2a80de1ece23b31c50,

title = "Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis",

abstract = "Multimodal sentiment analysis aims to extract sentiment cues from various modalities, such as textual, acoustic, and visual data, and manipulate them to determine the inherent sentiment polarity in the data. Despite significant achievements in multimodal sentiment analysis, challenges persist in addressing noise features in modal representations, eliminating substantial gaps in sentiment information among modal representations, and exploring contextual information that expresses different sentiments between modalities. To tackle these challenges, our paper proposes a new Multimodal Sentiment Analysis (MSA) framework. Firstly, we introduce the Hierarchical Denoising Representation Disentanglement module (HDRD), which employs hierarchical disentanglement techniques. This ensures the extraction of both common and private sentiment information while eliminating interference noise from modal representations. Furthermore, to address the uneven distribution of sentiment information among modalities, our Inter-Modal Representation Enhancement module (IMRE) enhances non-textual representations by extracting sentiment information related to non-textual representations from textual representations. Next, we introduce a new interaction mechanism, the Dual-Channel Cross-Modal Context Interaction module (DCCMCI). This module not only mines correlated contextual sentiment information within modalities but also explores positive and negative correlation contextual sentiment information between modalities. We conducted extensive experiments on two benchmark datasets, MOSI and MOSEI, and the results indicate that our proposed method offers state-of-the-art approaches.",

keywords = "Cross-modal context interaction, Hierarchical disentanglement, Inter-modal enhancement, Multimodal sentiment analysis",

author = "Zuhe Li and Zhenwei Huang and Yushan Pan and Jun Yu and Weihua Liu and Haoran Chen and Yiming Luo and Di Wu and Hao Wang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2024",

month = oct,

day = "15",

doi = "10.1016/j.eswa.2024.124236",

language = "English",

volume = "252",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier",

}

TY - JOUR

T1 - Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis

AU - Li, Zuhe

AU - Huang, Zhenwei

AU - Pan, Yushan

AU - Yu, Jun

AU - Liu, Weihua

AU - Chen, Haoran

AU - Luo, Yiming

AU - Wu, Di

AU - Wang, Hao

PY - 2024/10/15

Y1 - 2024/10/15

N2 - Multimodal sentiment analysis aims to extract sentiment cues from various modalities, such as textual, acoustic, and visual data, and manipulate them to determine the inherent sentiment polarity in the data. Despite significant achievements in multimodal sentiment analysis, challenges persist in addressing noise features in modal representations, eliminating substantial gaps in sentiment information among modal representations, and exploring contextual information that expresses different sentiments between modalities. To tackle these challenges, our paper proposes a new Multimodal Sentiment Analysis (MSA) framework. Firstly, we introduce the Hierarchical Denoising Representation Disentanglement module (HDRD), which employs hierarchical disentanglement techniques. This ensures the extraction of both common and private sentiment information while eliminating interference noise from modal representations. Furthermore, to address the uneven distribution of sentiment information among modalities, our Inter-Modal Representation Enhancement module (IMRE) enhances non-textual representations by extracting sentiment information related to non-textual representations from textual representations. Next, we introduce a new interaction mechanism, the Dual-Channel Cross-Modal Context Interaction module (DCCMCI). This module not only mines correlated contextual sentiment information within modalities but also explores positive and negative correlation contextual sentiment information between modalities. We conducted extensive experiments on two benchmark datasets, MOSI and MOSEI, and the results indicate that our proposed method offers state-of-the-art approaches.

AB - Multimodal sentiment analysis aims to extract sentiment cues from various modalities, such as textual, acoustic, and visual data, and manipulate them to determine the inherent sentiment polarity in the data. Despite significant achievements in multimodal sentiment analysis, challenges persist in addressing noise features in modal representations, eliminating substantial gaps in sentiment information among modal representations, and exploring contextual information that expresses different sentiments between modalities. To tackle these challenges, our paper proposes a new Multimodal Sentiment Analysis (MSA) framework. Firstly, we introduce the Hierarchical Denoising Representation Disentanglement module (HDRD), which employs hierarchical disentanglement techniques. This ensures the extraction of both common and private sentiment information while eliminating interference noise from modal representations. Furthermore, to address the uneven distribution of sentiment information among modalities, our Inter-Modal Representation Enhancement module (IMRE) enhances non-textual representations by extracting sentiment information related to non-textual representations from textual representations. Next, we introduce a new interaction mechanism, the Dual-Channel Cross-Modal Context Interaction module (DCCMCI). This module not only mines correlated contextual sentiment information within modalities but also explores positive and negative correlation contextual sentiment information between modalities. We conducted extensive experiments on two benchmark datasets, MOSI and MOSEI, and the results indicate that our proposed method offers state-of-the-art approaches.

KW - Cross-modal context interaction

KW - Hierarchical disentanglement

KW - Inter-modal enhancement

KW - Multimodal sentiment analysis

UR - http://www.scopus.com/inward/record.url?scp=85193903516&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2024.124236

DO - 10.1016/j.eswa.2024.124236

M3 - Article

AN - SCOPUS:85193903516

SN - 0957-4174

VL - 252

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 124236

ER -

Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this