TY - JOUR
T1 - Enhanced multi-scale feature adaptive fusion sparse convolutional network for large-scale scenes semantic segmentation
AU - Shen, Lingfeng
AU - Cao, Yanlong
AU - Zhu, Wenbin
AU - Ren, Kai
AU - Shou, Yejun
AU - Wang, Haocheng
AU - Xu, Zhijie
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/2
Y1 - 2025/2
N2 - Semantic segmentation has made notable strides in analyzing homogeneous large-scale 3D scenes, yet its application to varied scenes with diverse characteristics poses considerable challenges. Traditional methods have been hampered by the dependence on resource-intensive neighborhood search algorithms, leading to elevated computational demands. To overcome these limitations, we introduce the MFAF-SCNet, a novel and computationally streamlined approach for voxel-based sparse convolutional. Our key innovation is the multi-scale feature adaptive fusion (MFAF) module, which intelligently applies a spectrum of convolution kernel sizes at the network's entry point, enabling the extraction of multi-scale features. It adaptively calibrates the feature weighting to achieve optimal scale representation for different objects. Further augmenting our methodology is the LKSNet, an original sparse convolutional backbone designed to tackle the inherent inconsistencies in point cloud distribution. This is achieved by integrating inverted bottleneck structures with large kernel convolutions, significantly bolstering the network's feature extraction and spatial correlation proficiency. The efficacy of MFAF-SCNet was rigorously tested against three large-scale benchmark datasets—ScanNet and S3DIS for indoor scenes, and SemanticKITTI for outdoor scenes. The experimental results underscore our method's competitive edge, achieving high-performance benchmarks while ensuring computational efficiency.
AB - Semantic segmentation has made notable strides in analyzing homogeneous large-scale 3D scenes, yet its application to varied scenes with diverse characteristics poses considerable challenges. Traditional methods have been hampered by the dependence on resource-intensive neighborhood search algorithms, leading to elevated computational demands. To overcome these limitations, we introduce the MFAF-SCNet, a novel and computationally streamlined approach for voxel-based sparse convolutional. Our key innovation is the multi-scale feature adaptive fusion (MFAF) module, which intelligently applies a spectrum of convolution kernel sizes at the network's entry point, enabling the extraction of multi-scale features. It adaptively calibrates the feature weighting to achieve optimal scale representation for different objects. Further augmenting our methodology is the LKSNet, an original sparse convolutional backbone designed to tackle the inherent inconsistencies in point cloud distribution. This is achieved by integrating inverted bottleneck structures with large kernel convolutions, significantly bolstering the network's feature extraction and spatial correlation proficiency. The efficacy of MFAF-SCNet was rigorously tested against three large-scale benchmark datasets—ScanNet and S3DIS for indoor scenes, and SemanticKITTI for outdoor scenes. The experimental results underscore our method's competitive edge, achieving high-performance benchmarks while ensuring computational efficiency.
KW - Multi-scale feature
KW - Point cloud
KW - Semantic segmentation
KW - Sparse convolutional
UR - http://www.scopus.com/inward/record.url?scp=85211615861&partnerID=8YFLogxK
U2 - 10.1016/j.cag.2024.104105
DO - 10.1016/j.cag.2024.104105
M3 - Article
AN - SCOPUS:85211615861
SN - 0097-8493
VL - 126
JO - Computers and Graphics (Pergamon)
JF - Computers and Graphics (Pergamon)
M1 - 104105
ER -