ATV3D: 3D Object Detection from Attention-based Three-view Representation

Yu Han*, Yaran Chen*, Haoran Li, Yunzhen Zhao, Zhe Zhao, Pengfei Hu

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

In the fields of autonomous driving and robot perception, the majority of methods are designed for onboard camera object detection, while there are fewer methods specifically tailored to environmental cameras. However, environmental cameras have the capability to capture a significant amount of road geometry and vehicle position information, which can enhance the safety of autonomous driving. Nevertheless, there is a difference in perspective between environmental cameras and onboard cameras, resulting in poorer performance of many methods designed for onboard camera 3D object detection when apply to environmental camera. In this paper, we propose a 3D Object Detection Algorithm from Attention-based Three-view Representation (ATV3D). The algorithm projects the 2D image features onto three orthogonal views (left view, front view, bird's eye view) to achieve a representation of the 3D information. Compared to voxel-based 3D detection methods, our proposed approach retains the ability to capture 3D features while reducing computational complexity. During the process of three-view representation, we design a feature projection module based on attention. Unlike inverse perspective mapping that requires precise camera parameters, the attention can implicitly learn the mapping relationship from 2D images to the three-view planes. This enables the extraction and transformation of image features without the calibrated camera parameters, effectively addressing challenges associated with obtaining camera parameters for environmental cameras and their susceptibility to natural factors. The experimental results on the DAIR-V2X dataset demonstrate that our method achieves a 3D detection mean average precision (mAP) of 73.6%, surpassing the performance of previous calibration-free environmental camera methods. Furthermore, our method achieves the highest detection accuracy on the indoor multi-view robot dataset Neurons Perception, providing evidence of its outstanding detection performance.

Original languageEnglish
Title of host publication2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350359312
DOIs
Publication statusPublished - 2024
Event2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan
Duration: 30 Jun 20245 Jul 2024

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2024 International Joint Conference on Neural Networks, IJCNN 2024
Country/TerritoryJapan
CityYokohama
Period30/06/245/07/24

Keywords

  • 3D object detection
  • Attention
  • Autonomous driving
  • Calibration-free
  • Three-view representation

Fingerprint

Dive into the research topics of 'ATV3D: 3D Object Detection from Attention-based Three-view Representation'. Together they form a unique fingerprint.

Cite this