TY - GEN
T1 - Synchronize Feature Extracting and Matching
T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
AU - Ma, Teli
AU - Wang, Mengmeng
AU - Xiao, Jimin
AU - Wu, Huifeng
AU - Liu, Yong
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Siamese network has been a de facto benchmark framework for 3D LiDAR object tracking with a shared-parametric encoder extracting features from template and search region, respectively. This paradigm relies heavily on an additional matching network to model the cross-correlation/similarity of the template and search region. In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network. The synchronization mechanism is based on the dynamic affinity of the Transformer, and an in-depth analysis of the relevance is provided theoretically. Moreover, based on the synchronization, we introduce a novel Attentive PointsSampling strategy into the Transformer layers (APST), replacing the random/Farthest Points Sampling (FPS) method with sampling under the supervision of attentive relations between the template and search region. It implies connecting point-wise sampling with the feature learning, beneficial to aggregating more distinctive and geometric features for tracking with sparse points. Extensive experiments on two benchmark datasets (KITTI and NuScenes) show that SyncTrack achieves state-of-the-art performance in realtime tracking.
AB - Siamese network has been a de facto benchmark framework for 3D LiDAR object tracking with a shared-parametric encoder extracting features from template and search region, respectively. This paradigm relies heavily on an additional matching network to model the cross-correlation/similarity of the template and search region. In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network. The synchronization mechanism is based on the dynamic affinity of the Transformer, and an in-depth analysis of the relevance is provided theoretically. Moreover, based on the synchronization, we introduce a novel Attentive PointsSampling strategy into the Transformer layers (APST), replacing the random/Farthest Points Sampling (FPS) method with sampling under the supervision of attentive relations between the template and search region. It implies connecting point-wise sampling with the feature learning, beneficial to aggregating more distinctive and geometric features for tracking with sparse points. Extensive experiments on two benchmark datasets (KITTI and NuScenes) show that SyncTrack achieves state-of-the-art performance in realtime tracking.
UR - http://www.scopus.com/inward/record.url?scp=85185866827&partnerID=8YFLogxK
U2 - 10.1109/ICCV51070.2023.00913
DO - 10.1109/ICCV51070.2023.00913
M3 - Conference Proceeding
AN - SCOPUS:85185866827
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 9919
EP - 9929
BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 October 2023 through 6 October 2023
ER -