Flexible Video Matting With Temporally Coherent Trimaps Generation

Chenhui Xue, Shugong Xu*, Shiyi Mu, Yilin Gao

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Traditional video matting networks depend on user-annotated trimaps to estimate alpha mattes for the foreground in videos. However, creating trimaps is labor-intensive and rigid. Recent advancements in video matting aim to eliminate the need for trimaps, but these methods struggle to estimate alpha mattes for specific individuals in scenes featuring multiple instances. In this study, we propose the Flexible Video Matting (FVM) model, a novel video matting network capable of generating alpha mattes for any specified instance in a video using simple prompts such as text, bounding boxes, and points, without relying on user-annotated trimaps. FVM combines the Segment Anything Model (SAM) and a video object segmentation network to obtain semantic masks for the target instance. Additionally, we have designed a Mask-to-Trimap (MTT) module for FVM, based on a recurrent architecture. This module utilizes semantic masks and temporal information in the video to predict temporally consistent trimaps, which are subsequently fed into the matting module to generate temporally consistent alpha mattes. Experimental results on the video matting benchmark demonstrate that our model achieves state-of-the-art matting quality and exhibits superior temporal coherence compared with methods that directly apply image matting techniques to video matting tasks.

Original languageEnglish
Title of host publicationPattern Recognition and Artificial Intelligence - 4th International Conference, ICPRAI 2024, Proceedings
EditorsChristian Wallraven, Cheng-Lin Liu, Arun Ross
PublisherSpringer Science and Business Media Deutschland GmbH
Pages172-185
Number of pages14
ISBN (Print)9789819787012
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event4th International Conference on Pattern Recognition and Artificial Intelligence, ICPRAI 2024 - Jeju Island, Korea, Republic of
Duration: 3 Jul 20246 Jul 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14892 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th International Conference on Pattern Recognition and Artificial Intelligence, ICPRAI 2024
Country/TerritoryKorea, Republic of
CityJeju Island
Period3/07/246/07/24

Keywords

  • Mask-to-Trimap
  • Segment Anything
  • Temporal Coherence
  • Video Matting

Fingerprint

Dive into the research topics of 'Flexible Video Matting With Temporally Coherent Trimaps Generation'. Together they form a unique fingerprint.

Cite this