Abstract
Due to millimeter-wave (MMW) radar’s ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the the complementary strengths of camera and radar by feature-level fusion and propose a fully Transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin Transformer. We name our proposed model as CR-DINO and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art camera-radar fusion object detection models.
Original language | English |
---|---|
Pages (from-to) | 1 |
Number of pages | 1 |
Journal | IEEE Sensors Journal |
Volume | 24 |
Issue number | 7 |
DOIs | |
Publication status | Accepted/In press - 2024 |
Externally published | Yes |
Keywords
- Autonomous vehicle
- Cameras
- deep learning
- Feature extraction
- multi-sensor fusion
- Object detection
- object detection
- Radar
- Radar detection
- Radar imaging
- transformer
- Transformers