TY - JOUR
T1 - AL-HCL: Active Learning and Hierarchical Contrastive Learning for Multimodal Sentiment Analysis with Fusion Guidance
AU - He, Xiaojiang
AU - Pan, Yushan
AU - Xu, Zhijie
AU - Li, Zuhe
AU - Guo, Xinfei
AU - Yang, Chenguang
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Multimodal sentiment analysis (MSA) is a rapidly advancing field in artificial intelligence (AI). However, it faces two major challenges: (1) deep learning-based MSA models often rely on large multimodal datasets but struggle with suboptimal data utilization, and (2) inconsistencies across modalities hinder the effective fusion of diverse information sources. To address these challenges, we propose the Active Learning and Hierarchical Contrastive Learning (AL-HCL) model for MSA. This model incorporates active learning techniques to balance prediction uncertainty with sample diversity, selectively identifying and labeling high-value samples from an unlabeled pool. This approach reduces annotation costs while maintaining robust performance. Additionally, we introduce a three-tier contrastive learning framework. The first layer addresses heterogeneity within unimodal data, the second resolves discrepancies between unimodal and fused modalities, and the third employs a Matrix-Based Fusion (MBF) module to extract high-level semantic features, enabling deeper feature-level fusion. A novel modal fusion strategy further enhances cross-modal interactions, optimizing the fusion process. Extensive experiments on benchmark MSA datasets - CMU-MOSI, CMU-MOSEI, and CH-SIMS - demonstrate that AL-HCL outperforms state-of-the-art models, validating the effectiveness of the proposed active learning strategy.
AB - Multimodal sentiment analysis (MSA) is a rapidly advancing field in artificial intelligence (AI). However, it faces two major challenges: (1) deep learning-based MSA models often rely on large multimodal datasets but struggle with suboptimal data utilization, and (2) inconsistencies across modalities hinder the effective fusion of diverse information sources. To address these challenges, we propose the Active Learning and Hierarchical Contrastive Learning (AL-HCL) model for MSA. This model incorporates active learning techniques to balance prediction uncertainty with sample diversity, selectively identifying and labeling high-value samples from an unlabeled pool. This approach reduces annotation costs while maintaining robust performance. Additionally, we introduce a three-tier contrastive learning framework. The first layer addresses heterogeneity within unimodal data, the second resolves discrepancies between unimodal and fused modalities, and the third employs a Matrix-Based Fusion (MBF) module to extract high-level semantic features, enabling deeper feature-level fusion. A novel modal fusion strategy further enhances cross-modal interactions, optimizing the fusion process. Extensive experiments on benchmark MSA datasets - CMU-MOSI, CMU-MOSEI, and CH-SIMS - demonstrate that AL-HCL outperforms state-of-the-art models, validating the effectiveness of the proposed active learning strategy.
KW - active learning
KW - contrastive learning
KW - Multimodal sentiment analysis
UR - https://ieeexplore.ieee.org/abstract/document/11180049
UR - https://www.scopus.com/pages/publications/105017807443
U2 - 10.1109/TAFFC.2025.3614159
DO - 10.1109/TAFFC.2025.3614159
M3 - Article
AN - SCOPUS:105017807443
SN - 1949-3045
VL - 17
SP - 303
EP - 316
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 1
ER -