SBSS: Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image

Yuanzhi Cai; Lei Fan; Yuan Fang

doi:10.1109/TGRS.2023.3234549

SBSS: Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image

Yuanzhi Cai, Lei Fan^*, Yuan Fang

^*Corresponding author for this work

Department of Civil Engineering

Xi'an Jiaotong-Liverpool University

Research output: Contribution to journal › Article › peer-review

20 Citations (Scopus)

Abstract

Semantic segmentation of very high-resolution (VHR) remote sensing images is a fundamental task for many applications. However, large variations in the scales of objects in those VHR images pose a challenge for performing accurate semantic segmentation. Existing semantic segmentation networks are able to analyze an input image at up to four resizing scales, but this may be insufficient given the diversity of object scales. Therefore, multiscale (MS) test-time data augmentation is often used in practice to obtain more accurate segmentation results, which makes equal use of the segmentation results obtained at the different resizing scales. However, it was found in this study that different classes of objects had their preferred resizing scale for more accurate semantic segmentation. Based on this behavior, a stacking-based semantic segmentation (SBSS) framework is proposed to improve the segmentation results by learning this behavior, which contains a learnable error correction module (ECM) for segmentation result fusion and an error correction scheme (ECS) for computational complexity control. Two ECS, i.e., ECS-MS and ECS-single-scale (SS), are proposed and investigated in this study. The floating-point operations (Flops) required for ECS-MS and ECS-SS are similar to the commonly used MS test and the SS test, respectively. Extensive experiments on four datasets (i.e., Cityscapes, UAVid, LoveDA, and Potsdam) show that SBSS is an effective and flexible framework. It achieved higher accuracy than MS when using ECS-MS, and similar accuracy as SS with a quarter of the memory footprint when using ECS-SS.

Original language	English
Article number	5600514
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	61
DOIs	https://doi.org/10.1109/TGRS.2023.3234549
Publication status	Published - 2023

Keywords

Convolutional neural network
deep learning
ensemble learning
semantic segmentation
stacking

Access to Document

10.1109/TGRS.2023.3234549

Cite this

@article{4f3c335a9e204dc187432bd9c6669e7c,

title = "SBSS: Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image",

abstract = "Semantic segmentation of very high-resolution (VHR) remote sensing images is a fundamental task for many applications. However, large variations in the scales of objects in those VHR images pose a challenge for performing accurate semantic segmentation. Existing semantic segmentation networks are able to analyze an input image at up to four resizing scales, but this may be insufficient given the diversity of object scales. Therefore, multiscale (MS) test-time data augmentation is often used in practice to obtain more accurate segmentation results, which makes equal use of the segmentation results obtained at the different resizing scales. However, it was found in this study that different classes of objects had their preferred resizing scale for more accurate semantic segmentation. Based on this behavior, a stacking-based semantic segmentation (SBSS) framework is proposed to improve the segmentation results by learning this behavior, which contains a learnable error correction module (ECM) for segmentation result fusion and an error correction scheme (ECS) for computational complexity control. Two ECS, i.e., ECS-MS and ECS-single-scale (SS), are proposed and investigated in this study. The floating-point operations (Flops) required for ECS-MS and ECS-SS are similar to the commonly used MS test and the SS test, respectively. Extensive experiments on four datasets (i.e., Cityscapes, UAVid, LoveDA, and Potsdam) show that SBSS is an effective and flexible framework. It achieved higher accuracy than MS when using ECS-MS, and similar accuracy as SS with a quarter of the memory footprint when using ECS-SS.",

keywords = "Convolutional neural network, deep learning, ensemble learning, semantic segmentation, stacking",

author = "Yuanzhi Cai and Lei Fan and Yuan Fang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2023.3234549",

language = "English",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

}

TY - JOUR

T1 - SBSS

T2 - Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image

AU - Cai, Yuanzhi

AU - Fan, Lei

AU - Fang, Yuan

PY - 2023

Y1 - 2023

N2 - Semantic segmentation of very high-resolution (VHR) remote sensing images is a fundamental task for many applications. However, large variations in the scales of objects in those VHR images pose a challenge for performing accurate semantic segmentation. Existing semantic segmentation networks are able to analyze an input image at up to four resizing scales, but this may be insufficient given the diversity of object scales. Therefore, multiscale (MS) test-time data augmentation is often used in practice to obtain more accurate segmentation results, which makes equal use of the segmentation results obtained at the different resizing scales. However, it was found in this study that different classes of objects had their preferred resizing scale for more accurate semantic segmentation. Based on this behavior, a stacking-based semantic segmentation (SBSS) framework is proposed to improve the segmentation results by learning this behavior, which contains a learnable error correction module (ECM) for segmentation result fusion and an error correction scheme (ECS) for computational complexity control. Two ECS, i.e., ECS-MS and ECS-single-scale (SS), are proposed and investigated in this study. The floating-point operations (Flops) required for ECS-MS and ECS-SS are similar to the commonly used MS test and the SS test, respectively. Extensive experiments on four datasets (i.e., Cityscapes, UAVid, LoveDA, and Potsdam) show that SBSS is an effective and flexible framework. It achieved higher accuracy than MS when using ECS-MS, and similar accuracy as SS with a quarter of the memory footprint when using ECS-SS.

AB - Semantic segmentation of very high-resolution (VHR) remote sensing images is a fundamental task for many applications. However, large variations in the scales of objects in those VHR images pose a challenge for performing accurate semantic segmentation. Existing semantic segmentation networks are able to analyze an input image at up to four resizing scales, but this may be insufficient given the diversity of object scales. Therefore, multiscale (MS) test-time data augmentation is often used in practice to obtain more accurate segmentation results, which makes equal use of the segmentation results obtained at the different resizing scales. However, it was found in this study that different classes of objects had their preferred resizing scale for more accurate semantic segmentation. Based on this behavior, a stacking-based semantic segmentation (SBSS) framework is proposed to improve the segmentation results by learning this behavior, which contains a learnable error correction module (ECM) for segmentation result fusion and an error correction scheme (ECS) for computational complexity control. Two ECS, i.e., ECS-MS and ECS-single-scale (SS), are proposed and investigated in this study. The floating-point operations (Flops) required for ECS-MS and ECS-SS are similar to the commonly used MS test and the SS test, respectively. Extensive experiments on four datasets (i.e., Cityscapes, UAVid, LoveDA, and Potsdam) show that SBSS is an effective and flexible framework. It achieved higher accuracy than MS when using ECS-MS, and similar accuracy as SS with a quarter of the memory footprint when using ECS-SS.

KW - Convolutional neural network

KW - deep learning

KW - ensemble learning

KW - semantic segmentation

KW - stacking

UR - http://www.scopus.com/inward/record.url?scp=85147216565&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2023.3234549

DO - 10.1109/TGRS.2023.3234549

M3 - Article

AN - SCOPUS:85147216565

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5600514

ER -

SBSS: Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this