TY - JOUR
T1 - Samba
T2 - Semantic segmentation of remotely sensed images with state space model
AU - Zhu, Qinfeng
AU - Cai, Yuanzhi
AU - Fang, Yuan
AU - Yang, Yihan
AU - Chen, Cheng
AU - Fan, Lei
AU - Nguyen, Anh
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/10/15
Y1 - 2024/10/15
N2 - High-resolution remotely sensed images pose challenges to traditional semantic segmentation networks, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). CNN-based methods struggle to handle high-resolution images due to their limited receptive field, while ViT-based methods, despite having a global receptive field, face challenges when processing long sequences. Inspired by the Mamba network, which is based on a state space model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed imagery, named Samba. Samba utilizes an encoder-decoder architecture, with multiple Samba blocks serving as the encoder to efficiently extract multi-level semantic information, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets using the mIoU and mF1 metrics, and compare it with top-performing CNN-based and ViT-based methods. The results demonstrate that Samba achieves unparalleled performance on commonly used remotely sensed datasets for semantic segmentation. Samba is the first to demonstrate the effectiveness of SSM in segmenting remotely sensed imagery, setting a new performance benchmark for Mamba-based techniques in this domain of semantic segmentation. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.
AB - High-resolution remotely sensed images pose challenges to traditional semantic segmentation networks, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). CNN-based methods struggle to handle high-resolution images due to their limited receptive field, while ViT-based methods, despite having a global receptive field, face challenges when processing long sequences. Inspired by the Mamba network, which is based on a state space model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed imagery, named Samba. Samba utilizes an encoder-decoder architecture, with multiple Samba blocks serving as the encoder to efficiently extract multi-level semantic information, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets using the mIoU and mF1 metrics, and compare it with top-performing CNN-based and ViT-based methods. The results demonstrate that Samba achieves unparalleled performance on commonly used remotely sensed datasets for semantic segmentation. Samba is the first to demonstrate the effectiveness of SSM in segmenting remotely sensed imagery, setting a new performance benchmark for Mamba-based techniques in this domain of semantic segmentation. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.
KW - Images
KW - Mamba
KW - Remote sensing
KW - Semantic segmentation
KW - State space model
UR - http://www.scopus.com/inward/record.url?scp=85204893136&partnerID=8YFLogxK
U2 - 10.1016/j.heliyon.2024.e38495
DO - 10.1016/j.heliyon.2024.e38495
M3 - Article
AN - SCOPUS:85204893136
SN - 2405-8440
VL - 10
JO - Heliyon
JF - Heliyon
IS - 19
M1 - e38495
ER -