TY - JOUR
T1 - Image augmentation agent for weakly supervised semantic segmentation
AU - Wu, Wangyu
AU - Qiu, Xianglin
AU - Song, Siqi
AU - Chen, Zhenhong
AU - Huang, Xiaowei
AU - Ma, Fei
AU - Xiao, Jimin
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/11/14
Y1 - 2025/11/14
N2 - Weakly Supervised Semantic Segmentation (WSSS), which utilizes only image-level annotations, has gained considerable attention for its efficiency and reduced cost. However, most existing WSSS methods focus on designing new network structures and loss functions to generate more accurate dense labels, overlooking the limitations imposed by fixed datasets, which can constrain performance improvements. We argue that more diverse trainable images provide WSSS with richer information and help model understand more comprehensive semantic patterns. Therefore in this paper, we introduce a novel approach called Image Augmentation Agent (IAA) which shows that it is possible to enhance WSSS from data generation perspective. IAA mainly designs an augmentation agent that leverages large language models (LLMs) and diffusion models to automatically generate additional images for WSSS. In practice, to address the instability in prompt generation by LLMs, we develop a prompt self-refinement mechanism. It allows LLMs to re-evaluate the rationality of generated prompts to produce more coherent prompts. Additionally, we insert an online filter into diffusion generation process to dynamically ensure the quality and balance of generated images. Experimental results show that our method significantly surpasses state-of-the-art WSSS approaches on the PASCAL VOC 2012 and MS COCO 2014 datasets. Our source code will be released.
AB - Weakly Supervised Semantic Segmentation (WSSS), which utilizes only image-level annotations, has gained considerable attention for its efficiency and reduced cost. However, most existing WSSS methods focus on designing new network structures and loss functions to generate more accurate dense labels, overlooking the limitations imposed by fixed datasets, which can constrain performance improvements. We argue that more diverse trainable images provide WSSS with richer information and help model understand more comprehensive semantic patterns. Therefore in this paper, we introduce a novel approach called Image Augmentation Agent (IAA) which shows that it is possible to enhance WSSS from data generation perspective. IAA mainly designs an augmentation agent that leverages large language models (LLMs) and diffusion models to automatically generate additional images for WSSS. In practice, to address the instability in prompt generation by LLMs, we develop a prompt self-refinement mechanism. It allows LLMs to re-evaluate the rationality of generated prompts to produce more coherent prompts. Additionally, we insert an online filter into diffusion generation process to dynamically ensure the quality and balance of generated images. Experimental results show that our method significantly surpasses state-of-the-art WSSS approaches on the PASCAL VOC 2012 and MS COCO 2014 datasets. Our source code will be released.
KW - Diffusion model
KW - Large language model
KW - Semantic segmentation
KW - Weakly-supervised learning
UR - https://www.scopus.com/pages/publications/105014115080
U2 - 10.1016/j.neucom.2025.131314
DO - 10.1016/j.neucom.2025.131314
M3 - Article
AN - SCOPUS:105014115080
SN - 0925-2312
VL - 654
JO - Neurocomputing
JF - Neurocomputing
M1 - 131314
ER -