Siamese network ensemble for visual tracking

Chenru Jiang, Jimin Xiao*, Yanchun Xie, Tammam Tillo, Kaizhu Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)


Visual object tracking is a challenging task considering illumination variation, occlusion, rotation, deformation and other problems. In this paper, we extend a Siamese INstance search Tracker (SINT) with model updating mechanism to improve its tracking robustness. SINT uses convolutional neural network (CNN) features, and compares the new frame features with the target features in the first frame. The candidate region with the highest similarity score is considered as the tracking result. However, SINT is not robust against large target variation because the matching model is not updated during the whole tracking process. To combat this defect, we propose an Ensemble Siamese Tracker (EST), where the final similarity score is also affected by the similarity with tracking results in recent frames instead of solely considering the first frame. Tracking results in recent frames are used to adjust the model for continuous target change. Meanwhile, we combine large displacement optical flow method with EST to further improve the performance (called EST+). We test the proposed EST and EST+ on a standard tracking benchmark OTB. It turns out the average overlap ratio of EST and EST+ increase 2.72% and 3.55% respectively compared with SINT on OTB 2013, which contains 51 video sequences. For the OTB 100, the average overlap ratio gain is 4.2%.

Original languageEnglish
Pages (from-to)2892-2903
Number of pages12
Publication statusPublished - 31 Jan 2018


  • CNN
  • Ensemble Siamese Tracker
  • Model updating
  • Siamese instance search Tracker


Dive into the research topics of 'Siamese network ensemble for visual tracking'. Together they form a unique fingerprint.

Cite this