CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer

Yuhao Jin; Xiaohui Zhu; Yong Yue; Eng Gee Lim; Wei Wang

doi:10.1109/JSEN.2024.3357775

CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer

Yuhao Jin, Xiaohui Zhu, Yong Yue, Eng Gee Lim, Wei Wang

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

Due to millimeter-wave (MMW) radar’s ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the the complementary strengths of camera and radar by feature-level fusion and propose a fully Transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin Transformer. We name our proposed model as CR-DINO and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art camera-radar fusion object detection models.

Original language	English
Pages (from-to)	1
Number of pages	1
Journal	IEEE Sensors Journal
Volume	24
Issue number	7
DOIs	https://doi.org/10.1109/JSEN.2024.3357775
Publication status	Accepted/In press - 2024
Externally published	Yes

Keywords

Autonomous vehicle
Cameras
deep learning
Feature extraction
multi-sensor fusion
Object detection
object detection
Radar
Radar detection
Radar imaging
transformer
Transformers

Access to Document

10.1109/JSEN.2024.3357775

Cite this

@article{baf578d43cf74644be93ac63ab64ec07,

title = "CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer",

abstract = "Due to millimeter-wave (MMW) radar{\textquoteright}s ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the the complementary strengths of camera and radar by feature-level fusion and propose a fully Transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin Transformer. We name our proposed model as CR-DINO and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art camera-radar fusion object detection models.",

keywords = "Autonomous vehicle, Cameras, deep learning, Feature extraction, multi-sensor fusion, Object detection, object detection, Radar, Radar detection, Radar imaging, transformer, Transformers",

author = "Yuhao Jin and Xiaohui Zhu and Yong Yue and Lim, {Eng Gee} and Wei Wang",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/JSEN.2024.3357775",

language = "English",

volume = "24",

pages = "1",

journal = "IEEE Sensors Journal",

issn = "1530-437X",

number = "7",

}

TY - JOUR

T1 - CR-DINO

T2 - A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer

AU - Jin, Yuhao

AU - Zhu, Xiaohui

AU - Yue, Yong

AU - Lim, Eng Gee

AU - Wang, Wei

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Due to millimeter-wave (MMW) radar’s ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the the complementary strengths of camera and radar by feature-level fusion and propose a fully Transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin Transformer. We name our proposed model as CR-DINO and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art camera-radar fusion object detection models.

AB - Due to millimeter-wave (MMW) radar’s ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the the complementary strengths of camera and radar by feature-level fusion and propose a fully Transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin Transformer. We name our proposed model as CR-DINO and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art camera-radar fusion object detection models.

KW - Autonomous vehicle

KW - Cameras

KW - deep learning

KW - Feature extraction

KW - multi-sensor fusion

KW - Object detection

KW - object detection

KW - Radar

KW - Radar detection

KW - Radar imaging

KW - transformer

KW - Transformers

UR - http://www.scopus.com/inward/record.url?scp=85184324452&partnerID=8YFLogxK

U2 - 10.1109/JSEN.2024.3357775

DO - 10.1109/JSEN.2024.3357775

M3 - Article

AN - SCOPUS:85184324452

SN - 1530-437X

VL - 24

SP - 1

JO - IEEE Sensors Journal

JF - IEEE Sensors Journal

IS - 7

ER -

CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this