Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection

Jian Wang; Siyue Yu; Bingfeng Zhang; Xinqiao Zhao; Ángel F. García-Fernández; Eng Gee Lim; Jimin Xiao

doi:10.1016/j.patcog.2024.110302

Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection

Jian Wang, Siyue Yu, Bingfeng Zhang, Xinqiao Zhao, Ángel F. García-Fernández, Eng Gee Lim^*, Jimin Xiao

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

10 Citations (Scopus)

Abstract

Scribble annotations have recently become popular in video salient object detection. Previous methods only focus on utilizing shallow feature consistency for more integral predictions. However, there is potential for consistency between cross-frame deep features to be used to help regularize saliency predictions better. Besides, we have observed that leveraging saliency predictions as pseudo-supervision signals yields notable improvements in extracting both intra-frame and cross-frame deep features. This, in turn, leads to more precise and detailed object structural information. Thus, we propose a cross-frame feature-saliency mutual reinforcing training process to assist scribble annotations for integral video saliency predictions. Specifically, we design a cross-frame feature regularization head, which leverages intra-frame and cross-frame deep feature consistency to regularize saliency predictions as auxiliary supervision. Then, to help obtain more accurate feature consistency, we design a cross-frame saliency regularization head, where predicted saliency values are used as pseudo-supervision signals to acquire better feature consistency. In this way, our cross-frame feature and saliency regularization heads can benefit from each other to help the network learn more accurately. Extensive experiments show that our method can achieve better performances than the previous best methods. The project is available at https://github.com/muchengxue0911/CFMR.

Original language	English
Article number	110302
Journal	Pattern Recognition
Volume	150
DOIs	https://doi.org/10.1016/j.patcog.2024.110302
Publication status	Published - Jun 2024

Keywords

Cross-frame feature consistency
Cross-frame saliency consistency
Scribble annotations
Video salient object detection

Access to Document

10.1016/j.patcog.2024.110302

Cite this

@article{22efca2bc3a94226bd31afc2bfa33789,

title = "Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection",

abstract = "Scribble annotations have recently become popular in video salient object detection. Previous methods only focus on utilizing shallow feature consistency for more integral predictions. However, there is potential for consistency between cross-frame deep features to be used to help regularize saliency predictions better. Besides, we have observed that leveraging saliency predictions as pseudo-supervision signals yields notable improvements in extracting both intra-frame and cross-frame deep features. This, in turn, leads to more precise and detailed object structural information. Thus, we propose a cross-frame feature-saliency mutual reinforcing training process to assist scribble annotations for integral video saliency predictions. Specifically, we design a cross-frame feature regularization head, which leverages intra-frame and cross-frame deep feature consistency to regularize saliency predictions as auxiliary supervision. Then, to help obtain more accurate feature consistency, we design a cross-frame saliency regularization head, where predicted saliency values are used as pseudo-supervision signals to acquire better feature consistency. In this way, our cross-frame feature and saliency regularization heads can benefit from each other to help the network learn more accurately. Extensive experiments show that our method can achieve better performances than the previous best methods. The project is available at https://github.com/muchengxue0911/CFMR.",

keywords = "Cross-frame feature consistency, Cross-frame saliency consistency, Scribble annotations, Video salient object detection",

author = "Jian Wang and Siyue Yu and Bingfeng Zhang and Xinqiao Zhao and Garc{\'i}a-Fern{\'a}ndez, {{\'A}ngel F.} and Lim, {Eng Gee} and Jimin Xiao",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2024",

month = jun,

doi = "10.1016/j.patcog.2024.110302",

language = "English",

volume = "150",

journal = "Pattern Recognition",

issn = "0031-3203",

}

TY - JOUR

T1 - Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection

AU - Wang, Jian

AU - Yu, Siyue

AU - Zhang, Bingfeng

AU - Zhao, Xinqiao

AU - García-Fernández, Ángel F.

AU - Lim, Eng Gee

AU - Xiao, Jimin

PY - 2024/6

Y1 - 2024/6

N2 - Scribble annotations have recently become popular in video salient object detection. Previous methods only focus on utilizing shallow feature consistency for more integral predictions. However, there is potential for consistency between cross-frame deep features to be used to help regularize saliency predictions better. Besides, we have observed that leveraging saliency predictions as pseudo-supervision signals yields notable improvements in extracting both intra-frame and cross-frame deep features. This, in turn, leads to more precise and detailed object structural information. Thus, we propose a cross-frame feature-saliency mutual reinforcing training process to assist scribble annotations for integral video saliency predictions. Specifically, we design a cross-frame feature regularization head, which leverages intra-frame and cross-frame deep feature consistency to regularize saliency predictions as auxiliary supervision. Then, to help obtain more accurate feature consistency, we design a cross-frame saliency regularization head, where predicted saliency values are used as pseudo-supervision signals to acquire better feature consistency. In this way, our cross-frame feature and saliency regularization heads can benefit from each other to help the network learn more accurately. Extensive experiments show that our method can achieve better performances than the previous best methods. The project is available at https://github.com/muchengxue0911/CFMR.

AB - Scribble annotations have recently become popular in video salient object detection. Previous methods only focus on utilizing shallow feature consistency for more integral predictions. However, there is potential for consistency between cross-frame deep features to be used to help regularize saliency predictions better. Besides, we have observed that leveraging saliency predictions as pseudo-supervision signals yields notable improvements in extracting both intra-frame and cross-frame deep features. This, in turn, leads to more precise and detailed object structural information. Thus, we propose a cross-frame feature-saliency mutual reinforcing training process to assist scribble annotations for integral video saliency predictions. Specifically, we design a cross-frame feature regularization head, which leverages intra-frame and cross-frame deep feature consistency to regularize saliency predictions as auxiliary supervision. Then, to help obtain more accurate feature consistency, we design a cross-frame saliency regularization head, where predicted saliency values are used as pseudo-supervision signals to acquire better feature consistency. In this way, our cross-frame feature and saliency regularization heads can benefit from each other to help the network learn more accurately. Extensive experiments show that our method can achieve better performances than the previous best methods. The project is available at https://github.com/muchengxue0911/CFMR.

KW - Cross-frame feature consistency

KW - Cross-frame saliency consistency

KW - Scribble annotations

KW - Video salient object detection

UR - http://www.scopus.com/inward/record.url?scp=85184756979&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2024.110302

DO - 10.1016/j.patcog.2024.110302

M3 - Article

AN - SCOPUS:85184756979

SN - 0031-3203

VL - 150

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 110302

ER -

Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this