Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

Chaolong Yang, Yuyao Yan, Weiguang Zhao, Jianan Ye, Xi Yang, Amir Hussain, Bin Dong, Kaizhu Huang*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review


3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become a mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network’s flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network’s inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.

Original languageEnglish
Title of host publicationNeural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
EditorsBiao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages13
ISBN (Print)9789819981830
Publication statusPublished - 2024
Event30th International Conference on Neural Information Processing, ICONIP 2023 - Changsha, China
Duration: 20 Nov 202323 Nov 2023

Publication series

NameCommunications in Computer and Information Science
Volume1969 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937


Conference30th International Conference on Neural Information Processing, ICONIP 2023


  • Multi-view fusion
  • Point cloud
  • Semantic segmentation


Dive into the research topics of 'Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation'. Together they form a unique fingerprint.

Cite this