CostDiff: Residual Diffusion-Based Cost Map Refinement for Open-Vocabulary Semantic Segmentation

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Open-Vocabulary Semantic Segmentation (OVSS) empowers models to recognize novel classes beyond predefined categories. While contrastive Vision-Language Models (VLMs) like CLIP enable open-vocabulary learning, they struggle with pixel-level semantic localization due to image-level pretraining. We propose a residual diffusion-based cost map refinement strategy to address these challenges. By treating CLIP’s coarse-grained classification maps as initial cost maps, our method iteratively refines them via a multi-step diffusion process, bridging the gap between high-level semantics and low-level spatial details. This enhances pixel-wise discriminative ability without retraining VLMs. Experiments on standard benchmarks demonstrate promising improvements in both quantitative accuracy and qualitative boundary precision, verifying the effectiveness of integrating diffusion for OVSS. Our approach offers a novel paradigm for advancing open-vocabulary visual understanding via foundation model refinement.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings
EditorsJosef Kittler, Hongkai Xiong, Weiyao Lin, Jian Yang, Xilin Chen, Jiwen Lu, Jingyi Yu, Weishi Zheng
PublisherSpringer Science and Business Media Deutschland GmbH
Pages120-134
Number of pages15
ISBN (Print)9789819557608
DOIs
Publication statusPublished - 2026
Event8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025 - Shanghai, China
Duration: 15 Oct 202518 Oct 2025

Publication series

NameLecture Notes in Computer Science
Volume16283 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025
Country/TerritoryChina
CityShanghai
Period15/10/2518/10/25

Keywords

  • Cost Map Refinement
  • Open-Vocabulary Semantic Segmentation
  • Residual Diffusion

Cite this