TY - GEN
T1 - ATV3D
T2 - 2024 International Joint Conference on Neural Networks, IJCNN 2024
AU - Han, Yu
AU - Chen, Yaran
AU - Li, Haoran
AU - Zhao, Yunzhen
AU - Zhao, Zhe
AU - Hu, Pengfei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In the fields of autonomous driving and robot perception, the majority of methods are designed for onboard camera object detection, while there are fewer methods specifically tailored to environmental cameras. However, environmental cameras have the capability to capture a significant amount of road geometry and vehicle position information, which can enhance the safety of autonomous driving. Nevertheless, there is a difference in perspective between environmental cameras and onboard cameras, resulting in poorer performance of many methods designed for onboard camera 3D object detection when apply to environmental camera. In this paper, we propose a 3D Object Detection Algorithm from Attention-based Three-view Representation (ATV3D). The algorithm projects the 2D image features onto three orthogonal views (left view, front view, bird's eye view) to achieve a representation of the 3D information. Compared to voxel-based 3D detection methods, our proposed approach retains the ability to capture 3D features while reducing computational complexity. During the process of three-view representation, we design a feature projection module based on attention. Unlike inverse perspective mapping that requires precise camera parameters, the attention can implicitly learn the mapping relationship from 2D images to the three-view planes. This enables the extraction and transformation of image features without the calibrated camera parameters, effectively addressing challenges associated with obtaining camera parameters for environmental cameras and their susceptibility to natural factors. The experimental results on the DAIR-V2X dataset demonstrate that our method achieves a 3D detection mean average precision (mAP) of 73.6%, surpassing the performance of previous calibration-free environmental camera methods. Furthermore, our method achieves the highest detection accuracy on the indoor multi-view robot dataset Neurons Perception, providing evidence of its outstanding detection performance.
AB - In the fields of autonomous driving and robot perception, the majority of methods are designed for onboard camera object detection, while there are fewer methods specifically tailored to environmental cameras. However, environmental cameras have the capability to capture a significant amount of road geometry and vehicle position information, which can enhance the safety of autonomous driving. Nevertheless, there is a difference in perspective between environmental cameras and onboard cameras, resulting in poorer performance of many methods designed for onboard camera 3D object detection when apply to environmental camera. In this paper, we propose a 3D Object Detection Algorithm from Attention-based Three-view Representation (ATV3D). The algorithm projects the 2D image features onto three orthogonal views (left view, front view, bird's eye view) to achieve a representation of the 3D information. Compared to voxel-based 3D detection methods, our proposed approach retains the ability to capture 3D features while reducing computational complexity. During the process of three-view representation, we design a feature projection module based on attention. Unlike inverse perspective mapping that requires precise camera parameters, the attention can implicitly learn the mapping relationship from 2D images to the three-view planes. This enables the extraction and transformation of image features without the calibrated camera parameters, effectively addressing challenges associated with obtaining camera parameters for environmental cameras and their susceptibility to natural factors. The experimental results on the DAIR-V2X dataset demonstrate that our method achieves a 3D detection mean average precision (mAP) of 73.6%, surpassing the performance of previous calibration-free environmental camera methods. Furthermore, our method achieves the highest detection accuracy on the indoor multi-view robot dataset Neurons Perception, providing evidence of its outstanding detection performance.
KW - 3D object detection
KW - Attention
KW - Autonomous driving
KW - Calibration-free
KW - Three-view representation
UR - http://www.scopus.com/inward/record.url?scp=85205009214&partnerID=8YFLogxK
U2 - 10.1109/IJCNN60899.2024.10651302
DO - 10.1109/IJCNN60899.2024.10651302
M3 - Conference Proceeding
AN - SCOPUS:85205009214
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2024 through 5 July 2024
ER -