TY - JOUR
T1 - Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection
AU - Wang, Jian
AU - Yu, Siyue
AU - Zhang, Bingfeng
AU - Zhao, Xinqiao
AU - García-Fernández, Ángel F.
AU - Lim, Eng Gee
AU - Xiao, Jimin
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6
Y1 - 2024/6
N2 - Scribble annotations have recently become popular in video salient object detection. Previous methods only focus on utilizing shallow feature consistency for more integral predictions. However, there is potential for consistency between cross-frame deep features to be used to help regularize saliency predictions better. Besides, we have observed that leveraging saliency predictions as pseudo-supervision signals yields notable improvements in extracting both intra-frame and cross-frame deep features. This, in turn, leads to more precise and detailed object structural information. Thus, we propose a cross-frame feature-saliency mutual reinforcing training process to assist scribble annotations for integral video saliency predictions. Specifically, we design a cross-frame feature regularization head, which leverages intra-frame and cross-frame deep feature consistency to regularize saliency predictions as auxiliary supervision. Then, to help obtain more accurate feature consistency, we design a cross-frame saliency regularization head, where predicted saliency values are used as pseudo-supervision signals to acquire better feature consistency. In this way, our cross-frame feature and saliency regularization heads can benefit from each other to help the network learn more accurately. Extensive experiments show that our method can achieve better performances than the previous best methods. The project is available at https://github.com/muchengxue0911/CFMR.
AB - Scribble annotations have recently become popular in video salient object detection. Previous methods only focus on utilizing shallow feature consistency for more integral predictions. However, there is potential for consistency between cross-frame deep features to be used to help regularize saliency predictions better. Besides, we have observed that leveraging saliency predictions as pseudo-supervision signals yields notable improvements in extracting both intra-frame and cross-frame deep features. This, in turn, leads to more precise and detailed object structural information. Thus, we propose a cross-frame feature-saliency mutual reinforcing training process to assist scribble annotations for integral video saliency predictions. Specifically, we design a cross-frame feature regularization head, which leverages intra-frame and cross-frame deep feature consistency to regularize saliency predictions as auxiliary supervision. Then, to help obtain more accurate feature consistency, we design a cross-frame saliency regularization head, where predicted saliency values are used as pseudo-supervision signals to acquire better feature consistency. In this way, our cross-frame feature and saliency regularization heads can benefit from each other to help the network learn more accurately. Extensive experiments show that our method can achieve better performances than the previous best methods. The project is available at https://github.com/muchengxue0911/CFMR.
KW - Cross-frame feature consistency
KW - Cross-frame saliency consistency
KW - Scribble annotations
KW - Video salient object detection
UR - http://www.scopus.com/inward/record.url?scp=85184756979&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2024.110302
DO - 10.1016/j.patcog.2024.110302
M3 - Article
AN - SCOPUS:85184756979
SN - 0031-3203
VL - 150
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 110302
ER -