TY - JOUR
T1 - SBSS
T2 - Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image
AU - Cai, Yuanzhi
AU - Fan, Lei
AU - Fang, Yuan
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Semantic segmentation of very high-resolution (VHR) remote sensing images is a fundamental task for many applications. However, large variations in the scales of objects in those VHR images pose a challenge for performing accurate semantic segmentation. Existing semantic segmentation networks are able to analyze an input image at up to four resizing scales, but this may be insufficient given the diversity of object scales. Therefore, multiscale (MS) test-time data augmentation is often used in practice to obtain more accurate segmentation results, which makes equal use of the segmentation results obtained at the different resizing scales. However, it was found in this study that different classes of objects had their preferred resizing scale for more accurate semantic segmentation. Based on this behavior, a stacking-based semantic segmentation (SBSS) framework is proposed to improve the segmentation results by learning this behavior, which contains a learnable error correction module (ECM) for segmentation result fusion and an error correction scheme (ECS) for computational complexity control. Two ECS, i.e., ECS-MS and ECS-single-scale (SS), are proposed and investigated in this study. The floating-point operations (Flops) required for ECS-MS and ECS-SS are similar to the commonly used MS test and the SS test, respectively. Extensive experiments on four datasets (i.e., Cityscapes, UAVid, LoveDA, and Potsdam) show that SBSS is an effective and flexible framework. It achieved higher accuracy than MS when using ECS-MS, and similar accuracy as SS with a quarter of the memory footprint when using ECS-SS.
AB - Semantic segmentation of very high-resolution (VHR) remote sensing images is a fundamental task for many applications. However, large variations in the scales of objects in those VHR images pose a challenge for performing accurate semantic segmentation. Existing semantic segmentation networks are able to analyze an input image at up to four resizing scales, but this may be insufficient given the diversity of object scales. Therefore, multiscale (MS) test-time data augmentation is often used in practice to obtain more accurate segmentation results, which makes equal use of the segmentation results obtained at the different resizing scales. However, it was found in this study that different classes of objects had their preferred resizing scale for more accurate semantic segmentation. Based on this behavior, a stacking-based semantic segmentation (SBSS) framework is proposed to improve the segmentation results by learning this behavior, which contains a learnable error correction module (ECM) for segmentation result fusion and an error correction scheme (ECS) for computational complexity control. Two ECS, i.e., ECS-MS and ECS-single-scale (SS), are proposed and investigated in this study. The floating-point operations (Flops) required for ECS-MS and ECS-SS are similar to the commonly used MS test and the SS test, respectively. Extensive experiments on four datasets (i.e., Cityscapes, UAVid, LoveDA, and Potsdam) show that SBSS is an effective and flexible framework. It achieved higher accuracy than MS when using ECS-MS, and similar accuracy as SS with a quarter of the memory footprint when using ECS-SS.
KW - Convolutional neural network
KW - deep learning
KW - ensemble learning
KW - semantic segmentation
KW - stacking
UR - http://www.scopus.com/inward/record.url?scp=85147216565&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2023.3234549
DO - 10.1109/TGRS.2023.3234549
M3 - Article
AN - SCOPUS:85147216565
SN - 0196-2892
VL - 61
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5600514
ER -