TY - JOUR
T1 - FLAG
T2 - A Framework with Explicit Learning Based on Appearance and Gait for Video-Based Clothes-Changing Person Re-Identification
AU - Lu, Hengjie
AU - Gao, Yilin
AU - Xu, Shugong
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Person re-identification (ReID) aims to search for the target person among the non-overlapping surveillance cameras. Video-based clothes-changing person re-identification (VCC-ReID) has become an essential branch of ReID due to the rich spatial and temporal information in the videos and the broad application of the scenarios. Appearance and gait are discriminative features in the video-based ReID, but appearance information is limited due to the clothes changing, which makes the VCC-ReID challenging. To solve this challenge, we propose a Framework with explicit Learning based on Appearance and Gait (FLAG), which can explicitly extract two corresponding types of information and be combined with most existing video-based ReID methods. The FLAG includes a multi-modal and multi-granularities Architecture (MGA), which is a large model, and a Cross-Modal Knowledge Distillation Scheme (CMKDS), which has a small model. They can be applied to devices with different computing resources. The MGA is designed to simultaneously take the visible light and silhouette modalities as input to explicitly learn the appearance and gait features, respectively. The silhouette modalities are composed of several levels of granularities to model global and local gait features and independently serve as input for MGA. The embedding-based parallel fusion module is proposed to fuse the appearance and multi-granularities gait feature efficiently. The CMKDS is present to distill the MGA to a small single-modal model that only uses the visible light modality as input. The embedding-based direct and indirect distillation strategies are designed in the CMKDS. Experimental results demonstrate that the FLAG combined with the existing video-based ReID methods can significantly improve their performance. In addition, when FLAG is combined with the AP3D method, the MGA can outperform state-of-the-art accuracy by 4.2%.
AB - Person re-identification (ReID) aims to search for the target person among the non-overlapping surveillance cameras. Video-based clothes-changing person re-identification (VCC-ReID) has become an essential branch of ReID due to the rich spatial and temporal information in the videos and the broad application of the scenarios. Appearance and gait are discriminative features in the video-based ReID, but appearance information is limited due to the clothes changing, which makes the VCC-ReID challenging. To solve this challenge, we propose a Framework with explicit Learning based on Appearance and Gait (FLAG), which can explicitly extract two corresponding types of information and be combined with most existing video-based ReID methods. The FLAG includes a multi-modal and multi-granularities Architecture (MGA), which is a large model, and a Cross-Modal Knowledge Distillation Scheme (CMKDS), which has a small model. They can be applied to devices with different computing resources. The MGA is designed to simultaneously take the visible light and silhouette modalities as input to explicitly learn the appearance and gait features, respectively. The silhouette modalities are composed of several levels of granularities to model global and local gait features and independently serve as input for MGA. The embedding-based parallel fusion module is proposed to fuse the appearance and multi-granularities gait feature efficiently. The CMKDS is present to distill the MGA to a small single-modal model that only uses the visible light modality as input. The embedding-based direct and indirect distillation strategies are designed in the CMKDS. Experimental results demonstrate that the FLAG combined with the existing video-based ReID methods can significantly improve their performance. In addition, when FLAG is combined with the AP3D method, the MGA can outperform state-of-the-art accuracy by 4.2%.
KW - Clothes-changing person re-identification
KW - Knowledge distillation
KW - Multi-modal learning
KW - Video-based person re-identification
UR - http://www.scopus.com/inward/record.url?scp=85207471869&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2024.3483265
DO - 10.1109/TCSVT.2024.3483265
M3 - Article
AN - SCOPUS:85207471869
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -