Abstract
Maintaining identity consistency and avoiding ID-switch during tracking is one of the primary focuses of multiple object tracking (MOT). One-shot MOT methods which jointly learn the detection and tracking models in one single network (hence namely, one-shot) have achieved promising results in tracking accuracy and speed. However, their capabilities of maintaining ID consistency are somehow weakened. The reason for this weakened ID consistency is two-fold: 1) the ID features learned by one-shot methods are not discriminative enough due to their heatmap-based single-location representation. 2) severe occlusion in the MOT scene leads to feature ambiguity and high ID-switch. In this paper, we propose a one-shot MOT system with strong ID consistency called PID-MOT (Preserved ID MOT). Specifically, we devise a visibility branch to predict the object occlusion level, and a predicted visibility map will be used in both Feature Refinement Model (FRM) and a visibility-guided two-stage association strategy (VGTAS). FRM is designed to strengthen the location-based features and enrich the identity information. VGTAS is proposed for tackling objects with high and low visibility separately. In addition, we initialize the parameters of our model by training on the recently emerged abundant synthetic MOTSynth dataset from scratch rather than the commonly used COCO dataset for full training. Finally, we carry out our method on the commonly used MOT datasets and the experimental results demonstrate that the proposed PID-MOT achieves especially good performances in ID F1 score (IDF1) and ID-Switch (IDS) compared with other state-of-the-art one-shot trackers, with comparable overall HOTA/MOTA performance. The code is available at https://github.com/Kroery/PIDMOT.
Original language | English |
---|---|
Pages (from-to) | 4473-4488 |
Number of pages | 16 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 34 |
Issue number | 6 |
DOIs | |
Publication status | Published - 1 Jun 2024 |
Externally published | Yes |
Keywords
- feature refinement model
- MOTSynth training
- Multiple object tracking
- one-shot
- visibility-guided two-stage association strategy