FLAG: A Framework with Explicit Learning Based on Appearance and Gait for Video-Based Clothes-Changing Person Re-Identification

Hengjie Lu, Yilin Gao, Shugong Xu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Person re-identification (ReID) aims to search for the target person among the non-overlapping surveillance cameras. Video-based clothes-changing person re-identification (VCC-ReID) has become an essential branch of ReID due to the rich spatial and temporal information in the videos and the broad application of the scenarios. Appearance and gait are discriminative features in the video-based ReID, but appearance information is limited due to the clothes changing, which makes the VCC-ReID challenging. To solve this challenge, we propose a Framework with explicit Learning based on Appearance and Gait (FLAG), which can explicitly extract two corresponding types of information and be combined with most existing video-based ReID methods. The FLAG includes a multi-modal and multi-granularities Architecture (MGA), which is a large model, and a Cross-Modal Knowledge Distillation Scheme (CMKDS), which has a small model. They can be applied to devices with different computing resources. The MGA is designed to simultaneously take the visible light and silhouette modalities as input to explicitly learn the appearance and gait features, respectively. The silhouette modalities are composed of several levels of granularities to model global and local gait features and independently serve as input for MGA. The embedding-based parallel fusion module is proposed to fuse the appearance and multi-granularities gait feature efficiently. The CMKDS is present to distill the MGA to a small single-modal model that only uses the visible light modality as input. The embedding-based direct and indirect distillation strategies are designed in the CMKDS. Experimental results demonstrate that the FLAG combined with the existing video-based ReID methods can significantly improve their performance. In addition, when FLAG is combined with the AP3D method, the MGA can outperform state-of-the-art accuracy by 4.2%.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusAccepted/In press - 2024
Externally publishedYes

Keywords

  • Clothes-changing person re-identification
  • Knowledge distillation
  • Multi-modal learning
  • Video-based person re-identification

Fingerprint

Dive into the research topics of 'FLAG: A Framework with Explicit Learning Based on Appearance and Gait for Video-Based Clothes-Changing Person Re-Identification'. Together they form a unique fingerprint.

Cite this