Fast Template Matching and Update for Video Object Tracking and Segmentation

Mingjie Sun; Jimin Xiao; Eng Gee Lim; Bingfeng Zhang; Yao Zhao

doi:10.1109/CVPR42600.2020.01080

Fast Template Matching and Update for Video Object Tracking and Segmentation

Mingjie Sun, Jimin Xiao^*, Eng Gee Lim, Bingfeng Zhang, Yao Zhao

^*Corresponding author for this work

Research output: Contribution to journal › Conference article › peer-review

50 Citations (Scopus)

Abstract

In this paper, the main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames where only the first-frame box-level ground-truth is provided. Detection-based algorithms are widely adopted to handle this task, and the challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template using the newly predicted result. The existing methods, however, make these selections in a rough and inflexible way, compromising their performance. To overcome this limitation, we propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time. Specifically, the reinforcement learning agent learns to decide whether to update the target template according to the quality of the predicted result. The choice of the matching method will be determined at the same time, based on the action history of the reinforcement learning agent. Experiments show that our method is almost 10 times faster than the previous state-of-the-art method with even higher accuracy (region similarity of 69.1% on DAVIS 2017 dataset).

Original language	English
Article number	9157514
Pages (from-to)	10788-10796
Number of pages	9
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs	https://doi.org/10.1109/CVPR42600.2020.01080
Publication status	Published - 2020
Event	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States Duration: 14 Jun 2020 → 19 Jun 2020

Access to Document

10.1109/CVPR42600.2020.01080

Cite this

@article{f45a8914500949d29b085579c5cea205,

title = "Fast Template Matching and Update for Video Object Tracking and Segmentation",

abstract = "In this paper, the main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames where only the first-frame box-level ground-truth is provided. Detection-based algorithms are widely adopted to handle this task, and the challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template using the newly predicted result. The existing methods, however, make these selections in a rough and inflexible way, compromising their performance. To overcome this limitation, we propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time. Specifically, the reinforcement learning agent learns to decide whether to update the target template according to the quality of the predicted result. The choice of the matching method will be determined at the same time, based on the action history of the reinforcement learning agent. Experiments show that our method is almost 10 times faster than the previous state-of-the-art method with even higher accuracy (region similarity of 69.1% on DAVIS 2017 dataset).",

author = "Mingjie Sun and Jimin Xiao and Lim, {Eng Gee} and Bingfeng Zhang and Yao Zhao",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 ; Conference date: 14-06-2020 Through 19-06-2020",

year = "2020",

doi = "10.1109/CVPR42600.2020.01080",

language = "English",

pages = "10788--10796",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

}

TY - JOUR

T1 - Fast Template Matching and Update for Video Object Tracking and Segmentation

AU - Sun, Mingjie

AU - Xiao, Jimin

AU - Lim, Eng Gee

AU - Zhang, Bingfeng

AU - Zhao, Yao

PY - 2020

Y1 - 2020

N2 - In this paper, the main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames where only the first-frame box-level ground-truth is provided. Detection-based algorithms are widely adopted to handle this task, and the challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template using the newly predicted result. The existing methods, however, make these selections in a rough and inflexible way, compromising their performance. To overcome this limitation, we propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time. Specifically, the reinforcement learning agent learns to decide whether to update the target template according to the quality of the predicted result. The choice of the matching method will be determined at the same time, based on the action history of the reinforcement learning agent. Experiments show that our method is almost 10 times faster than the previous state-of-the-art method with even higher accuracy (region similarity of 69.1% on DAVIS 2017 dataset).

AB - In this paper, the main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames where only the first-frame box-level ground-truth is provided. Detection-based algorithms are widely adopted to handle this task, and the challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template using the newly predicted result. The existing methods, however, make these selections in a rough and inflexible way, compromising their performance. To overcome this limitation, we propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time. Specifically, the reinforcement learning agent learns to decide whether to update the target template according to the quality of the predicted result. The choice of the matching method will be determined at the same time, based on the action history of the reinforcement learning agent. Experiments show that our method is almost 10 times faster than the previous state-of-the-art method with even higher accuracy (region similarity of 69.1% on DAVIS 2017 dataset).

UR - http://www.scopus.com/inward/record.url?scp=85094863793&partnerID=8YFLogxK

U2 - 10.1109/CVPR42600.2020.01080

DO - 10.1109/CVPR42600.2020.01080

M3 - Conference article

AN - SCOPUS:85094863793

SN - 1063-6919

SP - 10788

EP - 10796

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

M1 - 9157514

T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020

Y2 - 14 June 2020 through 19 June 2020

ER -

Fast Template Matching and Update for Video Object Tracking and Segmentation

Abstract

Access to Document

Other files and links

Cite this