TY - JOUR
T1 - Siamese network ensemble for visual tracking
AU - Jiang, Chenru
AU - Xiao, Jimin
AU - Xie, Yanchun
AU - Tillo, Tammam
AU - Huang, Kaizhu
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2018/1/31
Y1 - 2018/1/31
N2 - Visual object tracking is a challenging task considering illumination variation, occlusion, rotation, deformation and other problems. In this paper, we extend a Siamese INstance search Tracker (SINT) with model updating mechanism to improve its tracking robustness. SINT uses convolutional neural network (CNN) features, and compares the new frame features with the target features in the first frame. The candidate region with the highest similarity score is considered as the tracking result. However, SINT is not robust against large target variation because the matching model is not updated during the whole tracking process. To combat this defect, we propose an Ensemble Siamese Tracker (EST), where the final similarity score is also affected by the similarity with tracking results in recent frames instead of solely considering the first frame. Tracking results in recent frames are used to adjust the model for continuous target change. Meanwhile, we combine large displacement optical flow method with EST to further improve the performance (called EST+). We test the proposed EST and EST+ on a standard tracking benchmark OTB. It turns out the average overlap ratio of EST and EST+ increase 2.72% and 3.55% respectively compared with SINT on OTB 2013, which contains 51 video sequences. For the OTB 100, the average overlap ratio gain is 4.2%.
AB - Visual object tracking is a challenging task considering illumination variation, occlusion, rotation, deformation and other problems. In this paper, we extend a Siamese INstance search Tracker (SINT) with model updating mechanism to improve its tracking robustness. SINT uses convolutional neural network (CNN) features, and compares the new frame features with the target features in the first frame. The candidate region with the highest similarity score is considered as the tracking result. However, SINT is not robust against large target variation because the matching model is not updated during the whole tracking process. To combat this defect, we propose an Ensemble Siamese Tracker (EST), where the final similarity score is also affected by the similarity with tracking results in recent frames instead of solely considering the first frame. Tracking results in recent frames are used to adjust the model for continuous target change. Meanwhile, we combine large displacement optical flow method with EST to further improve the performance (called EST+). We test the proposed EST and EST+ on a standard tracking benchmark OTB. It turns out the average overlap ratio of EST and EST+ increase 2.72% and 3.55% respectively compared with SINT on OTB 2013, which contains 51 video sequences. For the OTB 100, the average overlap ratio gain is 4.2%.
KW - CNN
KW - Ensemble Siamese Tracker
KW - Model updating
KW - Siamese instance search Tracker
UR - http://www.scopus.com/inward/record.url?scp=85035092411&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2017.10.043
DO - 10.1016/j.neucom.2017.10.043
M3 - Article
AN - SCOPUS:85035092411
SN - 0925-2312
VL - 275
SP - 2892
EP - 2903
JO - Neurocomputing
JF - Neurocomputing
ER -