TY - JOUR
T1 - End-to-end weakly supervised semantic segmentation with reliable region mining
AU - Zhang, Bingfeng
AU - Xiao, Jimin
AU - Wei, Yunchao
AU - Huang, Kaizhu
AU - Luo, Shan
AU - Zhao, Yao
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/8
Y1 - 2022/8
N2 - Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.
AB - Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.
KW - Attention
KW - End-to-end
KW - Semantic segmentation
KW - Weakly supervised
UR - http://www.scopus.com/inward/record.url?scp=85127114257&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2022.108663
DO - 10.1016/j.patcog.2022.108663
M3 - Article
AN - SCOPUS:85127114257
SN - 0031-3203
VL - 128
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108663
ER -