A Deep Top-down Framework towards Generalisable Multi-View Pedestrian Detection

Rui Qiu; Ming Xu; Yuchen Ling; Jeremy S. Smith; Yuyao Yan; Xinheng Wang

doi:10.1016/j.neucom.2024.128458

A Deep Top-down Framework towards Generalisable Multi-View Pedestrian Detection

Rui Qiu, Ming Xu^*, Yuchen Ling, Jeremy S. Smith, Yuyao Yan, Xinheng Wang

^*Corresponding author for this work

University of Liverpool

Research output: Contribution to journal › Article › peer-review

Abstract

Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians' size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.

Original language	English
Article number	128458
Number of pages	11
Journal	Neurocomputing
Volume	607
DOIs	https://doi.org/10.1016/j.neucom.2024.128458
Publication status	Published - 22 Aug 2024

Access to Document

10.1016/j.neucom.2024.128458

Cite this

@article{f4a8146b5bc145c1bd2cdb012dc1a392,

title = "A Deep Top-down Framework towards Generalisable Multi-View Pedestrian Detection",

abstract = "Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians' size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.",

author = "Rui Qiu and Ming Xu and Yuchen Ling and Smith, {Jeremy S.} and Yuyao Yan and Xinheng Wang",

year = "2024",

month = aug,

day = "22",

doi = "10.1016/j.neucom.2024.128458",

language = "English",

volume = "607",

journal = "Neurocomputing",

issn = "0925-2312",

}

TY - JOUR

T1 - A Deep Top-down Framework towards Generalisable Multi-View Pedestrian Detection

AU - Qiu, Rui

AU - Xu, Ming

AU - Ling, Yuchen

AU - Smith, Jeremy S.

AU - Yan, Yuyao

AU - Wang, Xinheng

PY - 2024/8/22

Y1 - 2024/8/22

N2 - Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians' size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.

AB - Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians' size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85202157735&origin=resultslist

UR - https://github.com/xjtlu-cvlab/TMVD

U2 - 10.1016/j.neucom.2024.128458

DO - 10.1016/j.neucom.2024.128458

M3 - Article

SN - 0925-2312

VL - 607

JO - Neurocomputing

JF - Neurocomputing

M1 - 128458

ER -

A Deep Top-down Framework towards Generalisable Multi-View Pedestrian Detection

Abstract

Access to Document

Other files and links

Fingerprint

Cite this