TY - JOUR
T1 - Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation
AU - Wu, Wangyu
AU - Dai, Tianhong
AU - Chen, Zhenhong
AU - Huang, Xiaowei
AU - Ma, Fei
AU - Xiao, Jimin
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/7/14
Y1 - 2025/7/14
N2 - Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.
AB - Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.
KW - Diffusion model
KW - Prompt generation
KW - Semantic segmentation
KW - Weakly-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=105001800792&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2025.130103
DO - 10.1016/j.neucom.2025.130103
M3 - Article
AN - SCOPUS:105001800792
SN - 0925-2312
VL - 638
JO - Neurocomputing
JF - Neurocomputing
M1 - 130103
ER -