ATV3D: 3D Object Detection from Attention-based Three-view Representation

Yu Han; Yaran Chen; Haoran Li; Yunzhen Zhao; Zhe Zhao; Pengfei Hu

doi:10.1109/IJCNN60899.2024.10651302

ATV3D: 3D Object Detection from Attention-based Three-view Representation

Yu Han^*, Yaran Chen^*, Haoran Li, Yunzhen Zhao, Zhe Zhao, Pengfei Hu

^*Corresponding author for this work

Department of Intelligent Science

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

1 Citation (Scopus)

Abstract

In the fields of autonomous driving and robot perception, the majority of methods are designed for onboard camera object detection, while there are fewer methods specifically tailored to environmental cameras. However, environmental cameras have the capability to capture a significant amount of road geometry and vehicle position information, which can enhance the safety of autonomous driving. Nevertheless, there is a difference in perspective between environmental cameras and onboard cameras, resulting in poorer performance of many methods designed for onboard camera 3D object detection when apply to environmental camera. In this paper, we propose a 3D Object Detection Algorithm from Attention-based Three-view Representation (ATV3D). The algorithm projects the 2D image features onto three orthogonal views (left view, front view, bird's eye view) to achieve a representation of the 3D information. Compared to voxel-based 3D detection methods, our proposed approach retains the ability to capture 3D features while reducing computational complexity. During the process of three-view representation, we design a feature projection module based on attention. Unlike inverse perspective mapping that requires precise camera parameters, the attention can implicitly learn the mapping relationship from 2D images to the three-view planes. This enables the extraction and transformation of image features without the calibrated camera parameters, effectively addressing challenges associated with obtaining camera parameters for environmental cameras and their susceptibility to natural factors. The experimental results on the DAIR-V2X dataset demonstrate that our method achieves a 3D detection mean average precision (mAP) of 73.6%, surpassing the performance of previous calibration-free environmental camera methods. Furthermore, our method achieves the highest detection accuracy on the indoor multi-view robot dataset Neurons Perception, providing evidence of its outstanding detection performance.

Original language	English
Title of host publication	2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350359312
DOIs	https://doi.org/10.1109/IJCNN60899.2024.10651302
Publication status	Published - 2024
Event	2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan Duration: 30 Jun 2024 → 5 Jul 2024

Publication series

Name	Proceedings of the International Joint Conference on Neural Networks

Conference

Conference	2024 International Joint Conference on Neural Networks, IJCNN 2024
Country/Territory	Japan
City	Yokohama
Period	30/06/24 → 5/07/24

Keywords

3D object detection
Attention
Autonomous driving
Calibration-free
Three-view representation

Access to Document

10.1109/IJCNN60899.2024.10651302

Cite this

Han, Y., Chen, Y., Li, H., Zhao, Y., Zhao, Z., & Hu, P. (2024). ATV3D: 3D Object Detection from Attention-based Three-view Representation. In 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings (Proceedings of the International Joint Conference on Neural Networks). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN60899.2024.10651302

@inproceedings{c2a45003860f45989e33e4e65fc983d7,

title = "ATV3D: 3D Object Detection from Attention-based Three-view Representation",

abstract = "In the fields of autonomous driving and robot perception, the majority of methods are designed for onboard camera object detection, while there are fewer methods specifically tailored to environmental cameras. However, environmental cameras have the capability to capture a significant amount of road geometry and vehicle position information, which can enhance the safety of autonomous driving. Nevertheless, there is a difference in perspective between environmental cameras and onboard cameras, resulting in poorer performance of many methods designed for onboard camera 3D object detection when apply to environmental camera. In this paper, we propose a 3D Object Detection Algorithm from Attention-based Three-view Representation (ATV3D). The algorithm projects the 2D image features onto three orthogonal views (left view, front view, bird's eye view) to achieve a representation of the 3D information. Compared to voxel-based 3D detection methods, our proposed approach retains the ability to capture 3D features while reducing computational complexity. During the process of three-view representation, we design a feature projection module based on attention. Unlike inverse perspective mapping that requires precise camera parameters, the attention can implicitly learn the mapping relationship from 2D images to the three-view planes. This enables the extraction and transformation of image features without the calibrated camera parameters, effectively addressing challenges associated with obtaining camera parameters for environmental cameras and their susceptibility to natural factors. The experimental results on the DAIR-V2X dataset demonstrate that our method achieves a 3D detection mean average precision (mAP) of 73.6%, surpassing the performance of previous calibration-free environmental camera methods. Furthermore, our method achieves the highest detection accuracy on the indoor multi-view robot dataset Neurons Perception, providing evidence of its outstanding detection performance.",

keywords = "3D object detection, Attention, Autonomous driving, Calibration-free, Three-view representation",

author = "Yu Han and Yaran Chen and Haoran Li and Yunzhen Zhao and Zhe Zhao and Pengfei Hu",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 International Joint Conference on Neural Networks, IJCNN 2024 ; Conference date: 30-06-2024 Through 05-07-2024",

year = "2024",

doi = "10.1109/IJCNN60899.2024.10651302",

language = "English",

series = "Proceedings of the International Joint Conference on Neural Networks",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings",

}

Han, Y, Chen, Y, Li, H, Zhao, Y, Zhao, Z & Hu, P 2024, ATV3D: 3D Object Detection from Attention-based Three-view Representation. in 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings. Proceedings of the International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers Inc., 2024 International Joint Conference on Neural Networks, IJCNN 2024, Yokohama, Japan, 30/06/24. https://doi.org/10.1109/IJCNN60899.2024.10651302

ATV3D: 3D Object Detection from Attention-based Three-view Representation. / Han, Yu; Chen, Yaran; Li, Haoran et al.
2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2024. (Proceedings of the International Joint Conference on Neural Networks).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - ATV3D

T2 - 2024 International Joint Conference on Neural Networks, IJCNN 2024

AU - Han, Yu

AU - Chen, Yaran

AU - Li, Haoran

AU - Zhao, Yunzhen

AU - Zhao, Zhe

AU - Hu, Pengfei

PY - 2024

Y1 - 2024

N2 - In the fields of autonomous driving and robot perception, the majority of methods are designed for onboard camera object detection, while there are fewer methods specifically tailored to environmental cameras. However, environmental cameras have the capability to capture a significant amount of road geometry and vehicle position information, which can enhance the safety of autonomous driving. Nevertheless, there is a difference in perspective between environmental cameras and onboard cameras, resulting in poorer performance of many methods designed for onboard camera 3D object detection when apply to environmental camera. In this paper, we propose a 3D Object Detection Algorithm from Attention-based Three-view Representation (ATV3D). The algorithm projects the 2D image features onto three orthogonal views (left view, front view, bird's eye view) to achieve a representation of the 3D information. Compared to voxel-based 3D detection methods, our proposed approach retains the ability to capture 3D features while reducing computational complexity. During the process of three-view representation, we design a feature projection module based on attention. Unlike inverse perspective mapping that requires precise camera parameters, the attention can implicitly learn the mapping relationship from 2D images to the three-view planes. This enables the extraction and transformation of image features without the calibrated camera parameters, effectively addressing challenges associated with obtaining camera parameters for environmental cameras and their susceptibility to natural factors. The experimental results on the DAIR-V2X dataset demonstrate that our method achieves a 3D detection mean average precision (mAP) of 73.6%, surpassing the performance of previous calibration-free environmental camera methods. Furthermore, our method achieves the highest detection accuracy on the indoor multi-view robot dataset Neurons Perception, providing evidence of its outstanding detection performance.

AB - In the fields of autonomous driving and robot perception, the majority of methods are designed for onboard camera object detection, while there are fewer methods specifically tailored to environmental cameras. However, environmental cameras have the capability to capture a significant amount of road geometry and vehicle position information, which can enhance the safety of autonomous driving. Nevertheless, there is a difference in perspective between environmental cameras and onboard cameras, resulting in poorer performance of many methods designed for onboard camera 3D object detection when apply to environmental camera. In this paper, we propose a 3D Object Detection Algorithm from Attention-based Three-view Representation (ATV3D). The algorithm projects the 2D image features onto three orthogonal views (left view, front view, bird's eye view) to achieve a representation of the 3D information. Compared to voxel-based 3D detection methods, our proposed approach retains the ability to capture 3D features while reducing computational complexity. During the process of three-view representation, we design a feature projection module based on attention. Unlike inverse perspective mapping that requires precise camera parameters, the attention can implicitly learn the mapping relationship from 2D images to the three-view planes. This enables the extraction and transformation of image features without the calibrated camera parameters, effectively addressing challenges associated with obtaining camera parameters for environmental cameras and their susceptibility to natural factors. The experimental results on the DAIR-V2X dataset demonstrate that our method achieves a 3D detection mean average precision (mAP) of 73.6%, surpassing the performance of previous calibration-free environmental camera methods. Furthermore, our method achieves the highest detection accuracy on the indoor multi-view robot dataset Neurons Perception, providing evidence of its outstanding detection performance.

KW - 3D object detection

KW - Attention

KW - Autonomous driving

KW - Calibration-free

KW - Three-view representation

UR - http://www.scopus.com/inward/record.url?scp=85205009214&partnerID=8YFLogxK

U2 - 10.1109/IJCNN60899.2024.10651302

DO - 10.1109/IJCNN60899.2024.10651302

M3 - Conference Proceeding

AN - SCOPUS:85205009214

T3 - Proceedings of the International Joint Conference on Neural Networks

BT - 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 30 June 2024 through 5 July 2024

ER -

Han Y, Chen Y, Li H, Zhao Y, Zhao Z, Hu P. ATV3D: 3D Object Detection from Attention-based Three-view Representation. In 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2024. (Proceedings of the International Joint Conference on Neural Networks). doi: 10.1109/IJCNN60899.2024.10651302

ATV3D: 3D Object Detection from Attention-based Three-view Representation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this