TY - GEN
T1 - A Training-free Synthetic Data Selection Method for Semantic Segmentation
AU - Tang, Hao
AU - Yu, Siyue
AU - Pang, Jian
AU - Zhang, Bingfeng
N1 - Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2025/4/11
Y1 - 2025/4/11
N2 - Training semantic segmenter with synthetic data has been attracting great attention due to its easy accessibility and huge quantities. Most previous methods focused on producing large-scale synthetic image-annotation samples and then training the segmenter with all of them. However, such a solution remains a main challenge in that the poor-quality samples are unavoidable, and using them to train the model will damage the training process. In this paper, we propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to select high-quality samples for building a reliable synthetic dataset. Specifically, given massive synthetic image-annotation pairs, we first design a Perturbation-based CLIP Similarity (PCS) to measure the reliability of synthetic image, thus removing samples with low-quality images. Then we propose a class-balance Annotation Similarity Filter (ASF) by comparing the synthetic annotation with the response of CLIP to remove the samples related to low-quality annotations. The experimental results show that using our method significantly reduces the data size by half, while the trained segmenter achieves higher performance.
AB - Training semantic segmenter with synthetic data has been attracting great attention due to its easy accessibility and huge quantities. Most previous methods focused on producing large-scale synthetic image-annotation samples and then training the segmenter with all of them. However, such a solution remains a main challenge in that the poor-quality samples are unavoidable, and using them to train the model will damage the training process. In this paper, we propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to select high-quality samples for building a reliable synthetic dataset. Specifically, given massive synthetic image-annotation pairs, we first design a Perturbation-based CLIP Similarity (PCS) to measure the reliability of synthetic image, thus removing samples with low-quality images. Then we propose a class-balance Annotation Similarity Filter (ASF) by comparing the synthetic annotation with the response of CLIP to remove the samples related to low-quality annotations. The experimental results show that using our method significantly reduces the data size by half, while the trained segmenter achieves higher performance.
UR - https://www.scopus.com/pages/publications/105004030775
U2 - 10.1609/aaai.v39i7.32777
DO - 10.1609/aaai.v39i7.32777
M3 - Conference Proceeding
AN - SCOPUS:105004030775
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 7229
EP - 7237
BT - Special Track on AI Alignment
A2 - Walsh, Toby
A2 - Shah, Julie
A2 - Kolter, Zico
PB - Association for the Advancement of Artificial Intelligence
T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Y2 - 25 February 2025 through 4 March 2025
ER -