TY - GEN
T1 - MSCrackMamba
T2 - 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
AU - Zhu, Qinfeng
AU - Fang, Yuan
AU - Fan, Lei
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/8/22
Y1 - 2025/8/22
N2 - Crack detection is a critical task in structural health monitoring, aimed at assessing the structural integrity of bridges, buildings, and roads to prevent potential failures. Vision-based crack detection has become the mainstream approach due to its ease of implementation and effectiveness. Fusing infrared (IR) channels with red, green and blue (RGB) channels can enhance feature representation and thus improve crack detection. However, IR and RGB channels often differ in resolution. To align them, higher-resolution RGB images typically need to be downsampled to match the IR image resolution, which leads to the loss of fine details. Moreover, crack detection performance is restricted by the limited receptive fields and high computational complexity of traditional image segmentation networks. Inspired by the recently proposed Mamba neural architecture, this study introduces a twostage paradigm called MSCrackMamba, which leverages Vision Mamba along with a super-resolution network to address these challenges. Specifically, to align IR and RGB channels, we first apply super-resolution to IR channels to match the resolution of RGB channels for data fusion. Vision Mamba is then adopted as the backbone network, while UperNet is employed as the decoder for crack detection. Our approach is validated on the large-scale Crack Detection dataset Crack900, demonstrating an improvement of 3.55% in mIoU compared to the best-performing baseline methods.
AB - Crack detection is a critical task in structural health monitoring, aimed at assessing the structural integrity of bridges, buildings, and roads to prevent potential failures. Vision-based crack detection has become the mainstream approach due to its ease of implementation and effectiveness. Fusing infrared (IR) channels with red, green and blue (RGB) channels can enhance feature representation and thus improve crack detection. However, IR and RGB channels often differ in resolution. To align them, higher-resolution RGB images typically need to be downsampled to match the IR image resolution, which leads to the loss of fine details. Moreover, crack detection performance is restricted by the limited receptive fields and high computational complexity of traditional image segmentation networks. Inspired by the recently proposed Mamba neural architecture, this study introduces a twostage paradigm called MSCrackMamba, which leverages Vision Mamba along with a super-resolution network to address these challenges. Specifically, to align IR and RGB channels, we first apply super-resolution to IR channels to match the resolution of RGB channels for data fusion. Vision Mamba is then adopted as the backbone network, while UperNet is employed as the decoder for crack detection. Our approach is validated on the large-scale Crack Detection dataset Crack900, demonstrating an improvement of 3.55% in mIoU compared to the best-performing baseline methods.
KW - Crack detection
KW - Mamba
KW - Segmentation
KW - Semantic
KW - Super-resolution
UR - https://www.scopus.com/pages/publications/105033340783
U2 - 10.1109/BDAI66031.2025.11325509
DO - 10.1109/BDAI66031.2025.11325509
M3 - Conference Proceeding
AN - SCOPUS:105033340783
T3 - 2025 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
SP - 290
EP - 295
BT - 2025 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 August 2025 through 24 August 2025
ER -