TY - JOUR
T1 - Fast pixel-matching for video object segmentation
AU - Yu, Siyue
AU - Xiao, Jimin
AU - Zhang, Bingfeng
AU - Lim, Eng Gee
AU - Zhao, Yao
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/10
Y1 - 2021/10
N2 - Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.
AB - Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.
KW - Encoder–decoder
KW - Mask-propagation
KW - Non-local pixel matching
UR - http://www.scopus.com/inward/record.url?scp=85111019759&partnerID=8YFLogxK
U2 - 10.1016/j.image.2021.116373
DO - 10.1016/j.image.2021.116373
M3 - Article
AN - SCOPUS:85111019759
SN - 0923-5965
VL - 98
JO - Signal Processing: Image Communication
JF - Signal Processing: Image Communication
M1 - 116373
ER -