Abstract
Multi-view pedestrian tracking has frequently been used to cope with the challenges of occlusion and limited fields-of-view in single-view tracking. However, there are few end-to-end methods in this field. Many existing algorithms detect pedestrians in individual views, cluster projected detections in a top view and then track them. The others track pedestrians in individual views and then associate the projected tracklets in a top view. In this paper, an end-to-end framework is proposed for multi-view tracking, in which both multi-view and temporal aggregations of feature maps are applied. The multi-view aggregation projects the per-view feature maps to a top view, uses a transformer encoder to output encoded feature maps and then uses a CNN to calculate a pedestrian occupancy map. The temporal aggregation uses another CNN to estimate position offsets from the encoded feature maps in consecutive frames. Our experiments have demonstrated that this end-to-end framework outperforms the state-of-the-art online algorithms for multi-view pedestrian tracking.
| Original language | English |
|---|---|
| Article number | 104203 |
| Number of pages | 10 |
| Journal | Computer Vision and Image Understanding |
| Volume | 249 |
| DOIs | |
| Publication status | Published - 10 Oct 2024 |
Fingerprint
Dive into the research topics of 'An end-to-end tracking framework via multi-view and temporal feature aggregation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver