End-to-end weakly supervised semantic segmentation with reliable region mining

Bingfeng Zhang; Jimin Xiao; Yunchao Wei; Kaizhu Huang; Shan Luo; Yao Zhao

doi:10.1016/j.patcog.2022.108663

End-to-end weakly supervised semantic segmentation with reliable region mining

Bingfeng Zhang, Jimin Xiao^*, Yunchao Wei, Kaizhu Huang, Shan Luo, Yao Zhao

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

40 Citations (Scopus)

Abstract

Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.

Original language	English
Article number	108663
Journal	Pattern Recognition
Volume	128
DOIs	https://doi.org/10.1016/j.patcog.2022.108663
Publication status	Published - Aug 2022

Keywords

Attention
End-to-end
Semantic segmentation
Weakly supervised

Access to Document

10.1016/j.patcog.2022.108663

Cite this

@article{f2d7572efd584479ab805979b2858dcb,

title = "End-to-end weakly supervised semantic segmentation with reliable region mining",

abstract = "Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.",

keywords = "Attention, End-to-end, Semantic segmentation, Weakly supervised",

author = "Bingfeng Zhang and Jimin Xiao and Yunchao Wei and Kaizhu Huang and Shan Luo and Yao Zhao",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2022",

month = aug,

doi = "10.1016/j.patcog.2022.108663",

language = "English",

volume = "128",

journal = "Pattern Recognition",

issn = "0031-3203",

}

TY - JOUR

T1 - End-to-end weakly supervised semantic segmentation with reliable region mining

AU - Zhang, Bingfeng

AU - Xiao, Jimin

AU - Wei, Yunchao

AU - Huang, Kaizhu

AU - Luo, Shan

AU - Zhao, Yao

PY - 2022/8

Y1 - 2022/8

N2 - Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.

AB - Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.

KW - Attention

KW - End-to-end

KW - Semantic segmentation

KW - Weakly supervised

UR - http://www.scopus.com/inward/record.url?scp=85127114257&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2022.108663

DO - 10.1016/j.patcog.2022.108663

M3 - Article

AN - SCOPUS:85127114257

SN - 0031-3203

VL - 128

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 108663

ER -

End-to-end weakly supervised semantic segmentation with reliable region mining

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this