Skip to main navigation Skip to search Skip to main content

Fine-grained visual tracking via distribution-aware mask modeling and temporal propagation

  • Xin Zhou
  • , Junjie Zhang*
  • , Hongwen Yu
  • , Fangyu Wu
  • , Xiaoshui Huang
  • , Jian Zhang
  • *Corresponding author for this work
  • Shanghai University
  • Shanghai Jiao Tong University
  • University of Technology Sydney

Research output: Contribution to journalArticlepeer-review

Abstract

Visual tracking holds significant importance in enabling diverse practical applications, yet critical challenges persist in two key aspects: target characterization and motion dynamics. Foreground-background discrimination becomes problematic under real-world complexities like occlusion and scale variation, necessitating highly discriminative feature extraction. Moreover, appearance changes during target motion render static template strategies insufficient, demanding dynamic template updates to ensure continuity and prevent tracker drift. In this paper, we present FGTrack, a novel single object tracker that addresses these challenges through two perspectives. First, the Distribution-Aware Mask Modeling (DMM) enhances feature discriminability by leveraging Transformer attention distribution in conjunction with GridShift clustering to generate nuanced foreground mask. Building upon token candidate elimination from one-stream training process, this approach employs a simple yet efficient adaptive clustering to achieve precise foreground localization without the need for manual threshold adjustment. It effectively suppresses background interference by utilizing the token correlation between the template and search regions. Second, the Temporal Feature Propagation (TFP) ensures motion consistency by integrating autoregressive queries with spatio-temporal features. The TFP module maintains a dynamically updated query queue and aggregates historical features through a temporal attention mechanism. The spatio-temporal fusion maintains adaptive template updates and correlates the current frame's spatially encoded features with historical queries, capturing target evolution patterns through multi-head cross-attention. Experiments across five short-term and two long-term benchmarks demonstrate FGTrack's superiority over state-of-the-art trackers, particularly in occlusion and deformation scenarios, validating its balanced approach to spatial discrimination and temporal coherence. The code will be released at https://github.com/BroCome25/FGTrack.

Original languageEnglish
Article number114208
JournalKnowledge-Based Systems
Volume328
DOIs
Publication statusPublished - 25 Oct 2025

Keywords

  • Distribution-aware mask modeling
  • Temporal feature propagation
  • Visual object tracking

Fingerprint

Dive into the research topics of 'Fine-grained visual tracking via distribution-aware mask modeling and temporal propagation'. Together they form a unique fingerprint.

Cite this