End-to-end weakly supervised semantic segmentation with reliable region mining

Bingfeng Zhang, Jimin Xiao*, Yunchao Wei, Kaizhu Huang, Shan Luo, Yao Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)

Abstract

Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.

Original languageEnglish
Article number108663
JournalPattern Recognition
Volume128
DOIs
Publication statusPublished - Aug 2022

Keywords

  • Attention
  • End-to-end
  • Semantic segmentation
  • Weakly supervised

Fingerprint

Dive into the research topics of 'End-to-end weakly supervised semantic segmentation with reliable region mining'. Together they form a unique fingerprint.

Cite this