Mask-VRDet: A robust riverway panoptic perception model based on dual graph fusion of vision and 4D mmWave radar

Runwei Guan; Shanliang Yao; Lulu Liu; Xiaohui Zhu; Ka Lok Man; Yong Yue; Jeremy S. Smith; Eng Gee Lim; Yutao Yue

doi:10.1016/j.robot.2023.104572

Mask-VRDet: A robust riverway panoptic perception model based on dual graph fusion of vision and 4D mmWave radar

Runwei Guan, Shanliang Yao, Lulu Liu, Xiaohui Zhu, Ka Lok Man, Yong Yue, Jeremy S. Smith, Eng Gee Lim, Yutao Yue^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

With the development of Unmanned Surface Vehicles (USVs), the perception of inland waterways has become significant to autonomous navigation. RGB cameras can capture images with rich semantic features, but they would fail in adverse weather and at night. As a perception sensor that has initially emerged in recent years, 4D millimeter-wave radar (4D mmWave radar) can work in all weather and has more abundant point-cloud features than ordinary radar, but it also suffers from water-surface clutter seriously. Furthermore, the shape and outline of dense point cloud captured by 4D mmWave radar are irregular. CNN-based neural networks treat features as 2D rectangle grids, which excessively favor image modality and are unfriendly to radar modality. Therefore, we transform both features of image and radar into non-Euclidean space as graph structures. In this paper, we focus on robust panoptic perception in inland waterways. Firstly, we propose the first Clutter-Point-Removal (CPR) algorithm for 4D mmWave radar, removing water-surface clutter and improving the recall of radar targets. Secondly, we propose a high-performance panoptic perception model based on the graph neural network called Mask-VRDet, fusing features of vision and radar to simultaneously perform object detection and semantic segmentation. To the best of our knowledge, Mask-VRDet is the first riverway panoptic perception model based on vision-radar graphical fusion. It outperforms other single-modal and fusion models, and achieves state-of-the-art performance on our collected dataset. We release our code at https://github.com/GuanRunwei/Mask-VRDet-Official.

Original language	English
Article number	104572
Journal	Robotics and Autonomous Systems
Volume	171
DOIs	https://doi.org/10.1016/j.robot.2023.104572
Publication status	Published - 1 Jan 2024

Keywords

Fusion of vision and radar
Graph convolution network
Radar clutter removal
Riverway panoptic perception

Access to Document

10.1016/j.robot.2023.104572

Cite this

@article{aa35841f567344e6b30188c239b504cd,

title = "Mask-VRDet: A robust riverway panoptic perception model based on dual graph fusion of vision and 4D mmWave radar",

abstract = "With the development of Unmanned Surface Vehicles (USVs), the perception of inland waterways has become significant to autonomous navigation. RGB cameras can capture images with rich semantic features, but they would fail in adverse weather and at night. As a perception sensor that has initially emerged in recent years, 4D millimeter-wave radar (4D mmWave radar) can work in all weather and has more abundant point-cloud features than ordinary radar, but it also suffers from water-surface clutter seriously. Furthermore, the shape and outline of dense point cloud captured by 4D mmWave radar are irregular. CNN-based neural networks treat features as 2D rectangle grids, which excessively favor image modality and are unfriendly to radar modality. Therefore, we transform both features of image and radar into non-Euclidean space as graph structures. In this paper, we focus on robust panoptic perception in inland waterways. Firstly, we propose the first Clutter-Point-Removal (CPR) algorithm for 4D mmWave radar, removing water-surface clutter and improving the recall of radar targets. Secondly, we propose a high-performance panoptic perception model based on the graph neural network called Mask-VRDet, fusing features of vision and radar to simultaneously perform object detection and semantic segmentation. To the best of our knowledge, Mask-VRDet is the first riverway panoptic perception model based on vision-radar graphical fusion. It outperforms other single-modal and fusion models, and achieves state-of-the-art performance on our collected dataset. We release our code at https://github.com/GuanRunwei/Mask-VRDet-Official.",

keywords = "Fusion of vision and radar, Graph convolution network, Radar clutter removal, Riverway panoptic perception",

author = "Runwei Guan and Shanliang Yao and Lulu Liu and Xiaohui Zhu and Man, {Ka Lok} and Yong Yue and Smith, {Jeremy S.} and Lim, {Eng Gee} and Yutao Yue",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2024",

month = jan,

day = "1",

doi = "10.1016/j.robot.2023.104572",

language = "English",

volume = "171",

journal = "Robotics and Autonomous Systems",

issn = "0921-8890",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Mask-VRDet: A robust riverway panoptic perception model based on dual graph fusion of vision and 4D mmWave radar

AU - Guan, Runwei

AU - Yao, Shanliang

AU - Liu, Lulu

AU - Zhu, Xiaohui

AU - Man, Ka Lok

AU - Yue, Yong

AU - Smith, Jeremy S.

AU - Lim, Eng Gee

AU - Yue, Yutao

PY - 2024/1/1

Y1 - 2024/1/1

N2 - With the development of Unmanned Surface Vehicles (USVs), the perception of inland waterways has become significant to autonomous navigation. RGB cameras can capture images with rich semantic features, but they would fail in adverse weather and at night. As a perception sensor that has initially emerged in recent years, 4D millimeter-wave radar (4D mmWave radar) can work in all weather and has more abundant point-cloud features than ordinary radar, but it also suffers from water-surface clutter seriously. Furthermore, the shape and outline of dense point cloud captured by 4D mmWave radar are irregular. CNN-based neural networks treat features as 2D rectangle grids, which excessively favor image modality and are unfriendly to radar modality. Therefore, we transform both features of image and radar into non-Euclidean space as graph structures. In this paper, we focus on robust panoptic perception in inland waterways. Firstly, we propose the first Clutter-Point-Removal (CPR) algorithm for 4D mmWave radar, removing water-surface clutter and improving the recall of radar targets. Secondly, we propose a high-performance panoptic perception model based on the graph neural network called Mask-VRDet, fusing features of vision and radar to simultaneously perform object detection and semantic segmentation. To the best of our knowledge, Mask-VRDet is the first riverway panoptic perception model based on vision-radar graphical fusion. It outperforms other single-modal and fusion models, and achieves state-of-the-art performance on our collected dataset. We release our code at https://github.com/GuanRunwei/Mask-VRDet-Official.

AB - With the development of Unmanned Surface Vehicles (USVs), the perception of inland waterways has become significant to autonomous navigation. RGB cameras can capture images with rich semantic features, but they would fail in adverse weather and at night. As a perception sensor that has initially emerged in recent years, 4D millimeter-wave radar (4D mmWave radar) can work in all weather and has more abundant point-cloud features than ordinary radar, but it also suffers from water-surface clutter seriously. Furthermore, the shape and outline of dense point cloud captured by 4D mmWave radar are irregular. CNN-based neural networks treat features as 2D rectangle grids, which excessively favor image modality and are unfriendly to radar modality. Therefore, we transform both features of image and radar into non-Euclidean space as graph structures. In this paper, we focus on robust panoptic perception in inland waterways. Firstly, we propose the first Clutter-Point-Removal (CPR) algorithm for 4D mmWave radar, removing water-surface clutter and improving the recall of radar targets. Secondly, we propose a high-performance panoptic perception model based on the graph neural network called Mask-VRDet, fusing features of vision and radar to simultaneously perform object detection and semantic segmentation. To the best of our knowledge, Mask-VRDet is the first riverway panoptic perception model based on vision-radar graphical fusion. It outperforms other single-modal and fusion models, and achieves state-of-the-art performance on our collected dataset. We release our code at https://github.com/GuanRunwei/Mask-VRDet-Official.

KW - Fusion of vision and radar

KW - Graph convolution network

KW - Radar clutter removal

KW - Riverway panoptic perception

UR - http://www.scopus.com/inward/record.url?scp=85176149010&partnerID=8YFLogxK

U2 - 10.1016/j.robot.2023.104572

DO - 10.1016/j.robot.2023.104572

M3 - Article

SN - 0921-8890

VL - 171

JO - Robotics and Autonomous Systems

JF - Robotics and Autonomous Systems

M1 - 104572

ER -

Mask-VRDet: A robust riverway panoptic perception model based on dual graph fusion of vision and 4D mmWave radar

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this