CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer

Yuhao Jin, Xiaohui Zhu, Yong Yue, Eng Gee Lim, Wei Wang

Research output: Contribution to journalArticlepeer-review


Due to millimeter-wave (MMW) radar’s ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the the complementary strengths of camera and radar by feature-level fusion and propose a fully Transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin Transformer. We name our proposed model as CR-DINO and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art camera-radar fusion object detection models.

Original languageEnglish
Pages (from-to)1
Number of pages1
JournalIEEE Sensors Journal
Issue number7
Publication statusAccepted/In press - 2024
Externally publishedYes


  • Autonomous vehicle
  • Cameras
  • deep learning
  • Feature extraction
  • multi-sensor fusion
  • Object detection
  • object detection
  • Radar
  • Radar detection
  • Radar imaging
  • transformer
  • Transformers


Dive into the research topics of 'CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer'. Together they form a unique fingerprint.

Cite this