A streamlined framework for BEV-based 3D object detection with prior masking

Qinglin Tong, Junjie Zhang, Chenggang Yan, Dan Zeng*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In the field of autonomous driving, perception tasks based on Bird's-Eye-View (BEV) have attracted considerable research attention due to their numerous benefits. Despite recent advancements in performance, efficiency remains a challenge for real-world implementation. In this study, we propose an efficient and effective framework that constructs a spatio-temporal BEV feature from multi-camera inputs and leverages it for 3D object detection. Specifically, the success of our network is primarily attributed to the design of the lifting strategy and a tailored BEV encoder. The lifting strategy is tasked with the conversion of 2D features into 3D representations. In the absence of depth information in the images, we innovatively introduce a prior mask for the BEV feature, which can assess the significance of the feature along the camera ray at a low cost. Moreover, we design a lightweight BEV encoder, which significantly boosts the capacity of this physical-interpretation representation. In the encoder, we investigate the spatial relationships of the BEV feature and retain rich residual information from upstream. To further enhance performance, we establish a 2D object detection auxiliary head to delve into insights offered by 2D object detection and leverage the 4D information to explore the cues within the sequence. Benefiting from all these designs, our network can capture abundant semantic information from 3D scenes and strikes a balanced trade-off between efficiency and performance.

Original languageEnglish
Article number105229
JournalImage and Vision Computing
Volume150
DOIs
Publication statusPublished - Oct 2024
Externally publishedYes

Keywords

  • 3D object detection
  • Autonomous driving
  • bird's-eye-view (BEV) representation
  • Multi-camera

Fingerprint

Dive into the research topics of 'A streamlined framework for BEV-based 3D object detection with prior masking'. Together they form a unique fingerprint.

Cite this