TY - JOUR
T1 - Starting Point Selection and Multiple-Standard Matching for Video Object Segmentation With Language Annotation
AU - Sun, Mingjie
AU - Xiao, Jimin
AU - Lim, Eng Gee
AU - Zhao, Yao
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - In this study, we investigate language-level video object segmentation, where first-frame language annotation is used to describe the target object. Because a language label is typically compatible with all frames in a video, the proposed method can choose the most suitable starting frame to mitigate initialization failure. Apart from extracting the visual feature from a static video frame, a motion-language score based on optical flow is also proposed to describe moving objects more accurately. Scores of multiple standards are then aggregated using an attention-based mechanism to predict the final result. The proposed method is evaluated on four widely-used video object segmentation datasets, including the DAVIS 2017, DAVIS 2016, SegTrack V2 and YouTubeObject datasets, and a novel accuracy measured as mean region similarity is obtained on both the DAVIS 2017 (67.2%) and DAVIS 2016 (83.5%) datasets. The code will be published.
AB - In this study, we investigate language-level video object segmentation, where first-frame language annotation is used to describe the target object. Because a language label is typically compatible with all frames in a video, the proposed method can choose the most suitable starting frame to mitigate initialization failure. Apart from extracting the visual feature from a static video frame, a motion-language score based on optical flow is also proposed to describe moving objects more accurately. Scores of multiple standards are then aggregated using an attention-based mechanism to predict the final result. The proposed method is evaluated on four widely-used video object segmentation datasets, including the DAVIS 2017, DAVIS 2016, SegTrack V2 and YouTubeObject datasets, and a novel accuracy measured as mean region similarity is obtained on both the DAVIS 2017 (67.2%) and DAVIS 2016 (83.5%) datasets. The code will be published.
KW - Starting point
KW - language annotation
KW - matching strategy
KW - video object segmentation
UR - http://www.scopus.com/inward/record.url?scp=85126514942&partnerID=8YFLogxK
U2 - 10.1109/TMM.2022.3159403
DO - 10.1109/TMM.2022.3159403
M3 - Article
AN - SCOPUS:85126514942
SN - 1520-9210
VL - 25
SP - 3354
EP - 3363
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -