Boosting remote semantic segmentation using vision-and-language foundation model

Qiuyue Zhang, Zhiwang Zhang, Shiting Wen*, Chaoyi Pang, Fangyu Wu

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

Abstract

In recent years, visual analysis and processing of remote sensing images have become increasingly popular. Vision-language foundation models (such as RemoteCLIP) embed rich prior knowledge from a large number of remote sensing images through extensive pre-training. Although these models perform well in image-level tasks, their prior knowledge has not been fully utilized in pixel-level segmentation tasks. To address this issue, we propose a lightweight fusion framework named Remote Foundation Model for Segmentation (RFM-Seg). This framework trains V-branch connectors, VL-branch connectors, and the VL-map module while freezing both the foundation model and the remote sensing segmentation model. These modules effectively integrate multi-scale and multi-modal prior knowledge from remote sensing images into mainstream remote sensing segmentation models, thereby enhancing the model’s performance in pixel-level segmentation tasks. We validated the effectiveness of this model framework on four challenging aerial image segmentation benchmark datasets, including ISPRS Vaihingen, ISPRS Potsdam, Aerial, and LoveDA Urban. Experimental results demonstrate that RFM-Seg achieves state-of-the-art performance while maintaining highly efficient training and inference. The source code will be released at https://github.com/NBTAILAB/ RFM-Seg.

Original languageEnglish
Pages (from-to)6687-6700
Number of pages14
JournalVisual Computer
Volume41
Issue number9
DOIs
Publication statusPublished - Jul 2025

Keywords

  • Deep learning
  • Foundation model
  • Image processing and analysis
  • Prior knowledge
  • Semantic segmentation

Fingerprint

Dive into the research topics of 'Boosting remote semantic segmentation using vision-and-language foundation model'. Together they form a unique fingerprint.

Cite this