Samba: Semantic segmentation of remotely sensed images with state space model

Qinfeng Zhu; Yuanzhi Cai; Yuan Fang; Yihan Yang; Cheng Chen; Lei Fan; Anh Nguyen

doi:10.1016/j.heliyon.2024.e38495

Samba: Semantic segmentation of remotely sensed images with state space model

Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan^*, Anh Nguyen

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

35 Citations (Scopus)

Abstract

High-resolution remotely sensed images pose challenges to traditional semantic segmentation networks, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). CNN-based methods struggle to handle high-resolution images due to their limited receptive field, while ViT-based methods, despite having a global receptive field, face challenges when processing long sequences. Inspired by the Mamba network, which is based on a state space model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed imagery, named Samba. Samba utilizes an encoder-decoder architecture, with multiple Samba blocks serving as the encoder to efficiently extract multi-level semantic information, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets using the mIoU and mF1 metrics, and compare it with top-performing CNN-based and ViT-based methods. The results demonstrate that Samba achieves unparalleled performance on commonly used remotely sensed datasets for semantic segmentation. Samba is the first to demonstrate the effectiveness of SSM in segmenting remotely sensed imagery, setting a new performance benchmark for Mamba-based techniques in this domain of semantic segmentation. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.

Original language	English
Article number	e38495
Journal	Heliyon
Volume	10
Issue number	19
DOIs	https://doi.org/10.1016/j.heliyon.2024.e38495
Publication status	Published - 15 Oct 2024

Keywords

Images
Mamba
Remote sensing
Semantic segmentation
State space model

Access to Document

10.1016/j.heliyon.2024.e38495

Cite this

@article{7aa521b4e07441c18fc8db6b8b051822,

title = "Samba: Semantic segmentation of remotely sensed images with state space model",

abstract = "High-resolution remotely sensed images pose challenges to traditional semantic segmentation networks, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). CNN-based methods struggle to handle high-resolution images due to their limited receptive field, while ViT-based methods, despite having a global receptive field, face challenges when processing long sequences. Inspired by the Mamba network, which is based on a state space model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed imagery, named Samba. Samba utilizes an encoder-decoder architecture, with multiple Samba blocks serving as the encoder to efficiently extract multi-level semantic information, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets using the mIoU and mF1 metrics, and compare it with top-performing CNN-based and ViT-based methods. The results demonstrate that Samba achieves unparalleled performance on commonly used remotely sensed datasets for semantic segmentation. Samba is the first to demonstrate the effectiveness of SSM in segmenting remotely sensed imagery, setting a new performance benchmark for Mamba-based techniques in this domain of semantic segmentation. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.",

keywords = "Images, Mamba, Remote sensing, Semantic segmentation, State space model",

author = "Qinfeng Zhu and Yuanzhi Cai and Yuan Fang and Yihan Yang and Cheng Chen and Lei Fan and Anh Nguyen",

note = "Publisher Copyright: {\textcopyright} 2024 The Authors",

year = "2024",

month = oct,

day = "15",

doi = "10.1016/j.heliyon.2024.e38495",

language = "English",

volume = "10",

journal = "Heliyon",

issn = "2405-8440",

publisher = "Elsevier",

number = "19",

}

TY - JOUR

T1 - Samba

T2 - Semantic segmentation of remotely sensed images with state space model

AU - Zhu, Qinfeng

AU - Cai, Yuanzhi

AU - Fang, Yuan

AU - Yang, Yihan

AU - Chen, Cheng

AU - Fan, Lei

AU - Nguyen, Anh

PY - 2024/10/15

Y1 - 2024/10/15

N2 - High-resolution remotely sensed images pose challenges to traditional semantic segmentation networks, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). CNN-based methods struggle to handle high-resolution images due to their limited receptive field, while ViT-based methods, despite having a global receptive field, face challenges when processing long sequences. Inspired by the Mamba network, which is based on a state space model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed imagery, named Samba. Samba utilizes an encoder-decoder architecture, with multiple Samba blocks serving as the encoder to efficiently extract multi-level semantic information, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets using the mIoU and mF1 metrics, and compare it with top-performing CNN-based and ViT-based methods. The results demonstrate that Samba achieves unparalleled performance on commonly used remotely sensed datasets for semantic segmentation. Samba is the first to demonstrate the effectiveness of SSM in segmenting remotely sensed imagery, setting a new performance benchmark for Mamba-based techniques in this domain of semantic segmentation. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.

AB - High-resolution remotely sensed images pose challenges to traditional semantic segmentation networks, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). CNN-based methods struggle to handle high-resolution images due to their limited receptive field, while ViT-based methods, despite having a global receptive field, face challenges when processing long sequences. Inspired by the Mamba network, which is based on a state space model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed imagery, named Samba. Samba utilizes an encoder-decoder architecture, with multiple Samba blocks serving as the encoder to efficiently extract multi-level semantic information, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets using the mIoU and mF1 metrics, and compare it with top-performing CNN-based and ViT-based methods. The results demonstrate that Samba achieves unparalleled performance on commonly used remotely sensed datasets for semantic segmentation. Samba is the first to demonstrate the effectiveness of SSM in segmenting remotely sensed imagery, setting a new performance benchmark for Mamba-based techniques in this domain of semantic segmentation. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.

KW - Images

KW - Mamba

KW - Remote sensing

KW - Semantic segmentation

KW - State space model

UR - http://www.scopus.com/inward/record.url?scp=85204893136&partnerID=8YFLogxK

U2 - 10.1016/j.heliyon.2024.e38495

DO - 10.1016/j.heliyon.2024.e38495

M3 - Article

AN - SCOPUS:85204893136

SN - 2405-8440

VL - 10

JO - Heliyon

JF - Heliyon

IS - 19

M1 - e38495

ER -

Samba: Semantic segmentation of remotely sensed images with state space model

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this