3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

Rui Qiu; Ming Xu; Yuyao Yan; Jeremy S. Smith; Xi Yang

doi:10.48550/arXiv.2207.10895

3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

Rui Qiu, Ming Xu^*, Yuyao Yan, Jeremy S. Smith, Xi Yang

^*Corresponding author for this work

Research output: Contribution to journal › Article

Abstract

Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.

Original language	English
Article number	arXiv:2207.10895
Number of pages	16
Journal	arXiv preprint
DOIs	https://doi.org/10.48550/arXiv.2207.10895
Publication status	Published - 22 Jul 2022

Access to Document

10.48550/arXiv.2207.10895

Cite this

@article{85f40313a08a4f68a8ab52dce33ae107,

title = "3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization",

abstract = "Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection. ",

author = "Rui Qiu and Ming Xu and Yuyao Yan and Smith, {Jeremy S.} and Xi Yang",

year = "2022",

month = jul,

day = "22",

doi = "10.48550/arXiv.2207.10895",

language = "English",

journal = "arXiv preprint",

}

TY - JOUR

T1 - 3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

AU - Qiu, Rui

AU - Xu, Ming

AU - Yan, Yuyao

AU - Smith, Jeremy S.

AU - Yang, Xi

PY - 2022/7/22

Y1 - 2022/7/22

N2 - Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.

AB - Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.

UR - https://github.com/xjtlu-cvlab/3DROM

U2 - 10.48550/arXiv.2207.10895

DO - 10.48550/arXiv.2207.10895

M3 - Article

JO - arXiv preprint

JF - arXiv preprint

M1 - arXiv:2207.10895

ER -

3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

Abstract

Access to Document

Other files and links

Fingerprint

Cite this