TY - JOUR
T1 - Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis
AU - Li, Zuhe
AU - Guo, Qingbing
AU - Pan, Yushan
AU - Ding, Weiping
AU - Yu, Jun
AU - Zhang, Yazhou
AU - Liu, Weihua
AU - Chen, Haoran
AU - Wang, Hao
AU - Xie, Ying
N1 - Funding Information:
This study was supported by the National Natural Science Foundation of China under Grant 61702462 , 62276146 and 61906175 , the XJTLU RDF-21-02-008 , the Henan Provincial Science and Technology Research Project under Grant 222102210010 , 222102210064 , 232102211006 , 232102210044 , the Research and Practice Project of Higher Education Teaching Reform in Henan Province under Grant 2019SJGLX320 and 2019SJGLX020 , the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao [2021] No. 489-29 , the Academic Degrees & Graduate Education Reform Project of Henan Province under Grant 2021SJGLX 115Y .
Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/11
Y1 - 2023/11
N2 - Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion- and linguistics-guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.
AB - Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion- and linguistics-guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.
KW - Linguistic-guided transformer
KW - Multimodal sentiment analysis
KW - Self-supervised label generation
KW - Unimodal feature fusion
UR - http://www.scopus.com/inward/record.url?scp=85162851536&partnerID=8YFLogxK
U2 - 10.1016/j.inffus.2023.101891
DO - 10.1016/j.inffus.2023.101891
M3 - Article
AN - SCOPUS:85162851536
SN - 1566-2535
VL - 99
JO - Information Fusion
JF - Information Fusion
M1 - 101891
ER -