Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis

Zuhe Li; Qingbing Guo; Yushan Pan; Weiping Ding; Jun Yu; Yazhou Zhang; Weihua Liu; Haoran Chen; Hao Wang; Ying Xie

doi:10.1016/j.inffus.2023.101891

Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis

Zuhe Li, Qingbing Guo, Yushan Pan^*, Weiping Ding, Jun Yu, Yazhou Zhang, Weihua Liu, Haoran Chen, Hao Wang, Ying Xie

^*Corresponding author for this work

Department of Computing

Research output: Contribution to journal › Article › peer-review

29 Citations (Scopus)

Abstract

Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion- and linguistics-guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.

Original language	English
Article number	101891
Journal	Information Fusion
Volume	99
DOIs	https://doi.org/10.1016/j.inffus.2023.101891
Publication status	Published - Nov 2023

Keywords

Linguistic-guided transformer
Multimodal sentiment analysis
Self-supervised label generation
Unimodal feature fusion

Access to Document

10.1016/j.inffus.2023.101891

Cite this

@article{73b1a6fee12e44e7b01e9d40e12909d4,

title = "Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis",

abstract = "Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion- and linguistics-guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.",

keywords = "Linguistic-guided transformer, Multimodal sentiment analysis, Self-supervised label generation, Unimodal feature fusion",

author = "Zuhe Li and Qingbing Guo and Yushan Pan and Weiping Ding and Jun Yu and Yazhou Zhang and Weihua Liu and Haoran Chen and Hao Wang and Ying Xie",

note = "Funding Information: This study was supported by the National Natural Science Foundation of China under Grant 61702462 , 62276146 and 61906175 , the XJTLU RDF-21-02-008 , the Henan Provincial Science and Technology Research Project under Grant 222102210010 , 222102210064 , 232102211006 , 232102210044 , the Research and Practice Project of Higher Education Teaching Reform in Henan Province under Grant 2019SJGLX320 and 2019SJGLX020 , the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao [2021] No. 489-29 , the Academic Degrees & Graduate Education Reform Project of Henan Province under Grant 2021SJGLX 115Y . Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2023",

month = nov,

doi = "10.1016/j.inffus.2023.101891",

language = "English",

volume = "99",

journal = "Information Fusion",

issn = "1566-2535",

}

TY - JOUR

T1 - Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis

AU - Li, Zuhe

AU - Guo, Qingbing

AU - Pan, Yushan

AU - Ding, Weiping

AU - Yu, Jun

AU - Zhang, Yazhou

AU - Liu, Weihua

AU - Chen, Haoran

AU - Wang, Hao

AU - Xie, Ying

N1 - Funding Information: This study was supported by the National Natural Science Foundation of China under Grant 61702462 , 62276146 and 61906175 , the XJTLU RDF-21-02-008 , the Henan Provincial Science and Technology Research Project under Grant 222102210010 , 222102210064 , 232102211006 , 232102210044 , the Research and Practice Project of Higher Education Teaching Reform in Henan Province under Grant 2019SJGLX320 and 2019SJGLX020 , the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao [2021] No. 489-29 , the Academic Degrees & Graduate Education Reform Project of Henan Province under Grant 2021SJGLX 115Y . Publisher Copyright: © 2023 Elsevier B.V.

PY - 2023/11

Y1 - 2023/11

N2 - Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion- and linguistics-guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.

AB - Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion- and linguistics-guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.

KW - Linguistic-guided transformer

KW - Multimodal sentiment analysis

KW - Self-supervised label generation

KW - Unimodal feature fusion

UR - http://www.scopus.com/inward/record.url?scp=85162851536&partnerID=8YFLogxK

U2 - 10.1016/j.inffus.2023.101891

DO - 10.1016/j.inffus.2023.101891

M3 - Article

AN - SCOPUS:85162851536

SN - 1566-2535

VL - 99

JO - Information Fusion

JF - Information Fusion

M1 - 101891

ER -

Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this