TY - GEN
T1 - Multi-modal Contextual Prompt Learning for Multi-label Classification with Partial Labels
AU - Wang, Rui
AU - Pan, Zhengxin
AU - Wu, Fangyu
AU - Lv, Yifan
AU - Zhang, Bailing
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/2/2
Y1 - 2024/2/2
N2 - Multi-label classification is a task with diverse applications, but current algorithms heavily rely on accurately labeled data, leading to time-consuming and labor-intensive data collection. However, multi-label classification with partial labels presents significant challenges. In this study, we propose Multi-modal Contextual Prompt Learning (MCPL), a novel approach that leverages large-scale visual-language models and exploits the strong image-text alignment in CLIP to address the scarcity of label annotations. We pre-train the visual language model's encoder on a large number of image-text pairs.. We introduce multi-modal contextual prompt learning in both images and labeled text to better utilize the image-label correspondence within CLIP, resulting in enhanced multi-label classification performance, even when faced with partial labels. We also use the coupling function to couple the two modes and realize the interactive connection of the two modal prompts. Extensive experiments on the MS-COCO and VOC2007 datasets, demonstrating its superiority and achieving competitive performance.
AB - Multi-label classification is a task with diverse applications, but current algorithms heavily rely on accurately labeled data, leading to time-consuming and labor-intensive data collection. However, multi-label classification with partial labels presents significant challenges. In this study, we propose Multi-modal Contextual Prompt Learning (MCPL), a novel approach that leverages large-scale visual-language models and exploits the strong image-text alignment in CLIP to address the scarcity of label annotations. We pre-train the visual language model's encoder on a large number of image-text pairs.. We introduce multi-modal contextual prompt learning in both images and labeled text to better utilize the image-label correspondence within CLIP, resulting in enhanced multi-label classification performance, even when faced with partial labels. We also use the coupling function to couple the two modes and realize the interactive connection of the two modal prompts. Extensive experiments on the MS-COCO and VOC2007 datasets, demonstrating its superiority and achieving competitive performance.
KW - Multi-label classification
KW - Partial label
KW - Prompt learning
UR - http://www.scopus.com/inward/record.url?scp=85196215440&partnerID=8YFLogxK
U2 - 10.1145/3651671.3651674
DO - 10.1145/3651671.3651674
M3 - Conference Proceeding
AN - SCOPUS:85196215440
T3 - ACM International Conference Proceeding Series
SP - 517
EP - 524
BT - Proceedings of the 2024 16th International Conference on Machine Learning and Computing, ICMLC 2024
PB - Association for Computing Machinery
T2 - 16th International Conference on Machine Learning and Computing, ICMLC 2024
Y2 - 2 February 2024 through 5 February 2024
ER -