Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation

Wangyu Wu; Tianhong Dai; Zhenhong Chen; Xiaowei Huang; Fei Ma; Jimin Xiao

doi:10.1016/j.neucom.2025.130103

Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation

Wangyu Wu, Tianhong Dai, Zhenhong Chen, Xiaowei Huang, Fei Ma^*, Jimin Xiao

^*Corresponding author for this work

Xi'an Jiaotong-Liverpool University

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.

Original language	English
Article number	130103
Journal	Neurocomputing
Volume	638
DOIs	https://doi.org/10.1016/j.neucom.2025.130103
Publication status	Published - 14 Jul 2025

Keywords

Diffusion model
Prompt generation
Semantic segmentation
Weakly-supervised learning

Access to Document

10.1016/j.neucom.2025.130103

Cite this

@article{f6e2e73f487142d8a4351681d0727620,

title = "Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation",

abstract = "Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.",

keywords = "Diffusion model, Prompt generation, Semantic segmentation, Weakly-supervised learning",

author = "Wangyu Wu and Tianhong Dai and Zhenhong Chen and Xiaowei Huang and Fei Ma and Jimin Xiao",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = jul,

day = "14",

doi = "10.1016/j.neucom.2025.130103",

language = "English",

volume = "638",

journal = "Neurocomputing",

issn = "0925-2312",

}

TY - JOUR

T1 - Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation

AU - Wu, Wangyu

AU - Dai, Tianhong

AU - Chen, Zhenhong

AU - Huang, Xiaowei

AU - Ma, Fei

AU - Xiao, Jimin

PY - 2025/7/14

Y1 - 2025/7/14

N2 - Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.

AB - Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.

KW - Diffusion model

KW - Prompt generation

KW - Semantic segmentation

KW - Weakly-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=105001800792&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2025.130103

DO - 10.1016/j.neucom.2025.130103

M3 - Article

AN - SCOPUS:105001800792

SN - 0925-2312

VL - 638

JO - Neurocomputing

JF - Neurocomputing

M1 - 130103

ER -

Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this