TY - GEN
T1 - 3D Random Occlusion and Multi-layer Projection for Deep Multi-camera Pedestrian Localization
AU - Qiu, Rui
AU - Xu, Ming
AU - Yan, Yuyao
AU - Smith, Jeremy S.
AU - Yang, Xi
N1 - Funding Information:
Acknowledgments. This work was supported by National Natural Science Foundation of China (NSFC) under Grant 60975082 and Xi’an Jiaotong-Liverpool University under Grant RDF-17-01-33, RDF-19-01-21 and FOSA2106045.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022/10
Y1 - 2022/10
N2 - Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection. Code is available at https://github.com/xjtlu-cvlab/3DROM.
AB - Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection. Code is available at https://github.com/xjtlu-cvlab/3DROM.
KW - Data augmentation
KW - Deep learning
KW - Multi-view detection
KW - Perspective transformations
UR - http://www.scopus.com/inward/record.url?scp=85144572364&partnerID=8YFLogxK
UR - https://github.com/xjtlu-cvlab/3DROM
U2 - 10.1007/978-3-031-20080-9_40
DO - 10.1007/978-3-031-20080-9_40
M3 - Conference Proceeding
AN - SCOPUS:85144572364
SN - 9783031200793
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 695
EP - 710
BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th European Conference on Computer Vision, ECCV 2022
Y2 - 23 October 2022 through 27 October 2022
ER -