Fast pixel-matching for video object segmentation

Siyue Yu; Jimin Xiao; Bingfeng Zhang; Eng Gee Lim; Yao Zhao

doi:10.1016/j.image.2021.116373

Fast pixel-matching for video object segmentation

Siyue Yu, Jimin Xiao^*, Bingfeng Zhang, Eng Gee Lim, Yao Zhao

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.

Original language	English
Article number	116373
Journal	Signal Processing: Image Communication
Volume	98
DOIs	https://doi.org/10.1016/j.image.2021.116373
Publication status	Published - Oct 2021

Keywords

Encoder–decoder
Mask-propagation
Non-local pixel matching

Access to Document

10.1016/j.image.2021.116373

Cite this

@article{7db1637060f54222b4bd6462e9103b4b,

title = "Fast pixel-matching for video object segmentation",

abstract = "Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.",

keywords = "Encoder–decoder, Mask-propagation, Non-local pixel matching",

author = "Siyue Yu and Jimin Xiao and Bingfeng Zhang and Lim, {Eng Gee} and Yao Zhao",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = oct,

doi = "10.1016/j.image.2021.116373",

language = "English",

volume = "98",

journal = "Signal Processing: Image Communication",

issn = "0923-5965",

}

TY - JOUR

T1 - Fast pixel-matching for video object segmentation

AU - Yu, Siyue

AU - Xiao, Jimin

AU - Zhang, Bingfeng

AU - Lim, Eng Gee

AU - Zhao, Yao

PY - 2021/10

Y1 - 2021/10

N2 - Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.

AB - Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.

KW - Encoder–decoder

KW - Mask-propagation

KW - Non-local pixel matching

UR - http://www.scopus.com/inward/record.url?scp=85111019759&partnerID=8YFLogxK

U2 - 10.1016/j.image.2021.116373

DO - 10.1016/j.image.2021.116373

M3 - Article

AN - SCOPUS:85111019759

SN - 0923-5965

VL - 98

JO - Signal Processing: Image Communication

JF - Signal Processing: Image Communication

M1 - 116373

ER -

Fast pixel-matching for video object segmentation

Abstract

Keywords

Access to Document

Other files and links

Cite this