TY - JOUR
T1 - Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation
AU - Wu, Wangyu
AU - Dai, Tianhong
AU - Chen, Zhenhong
AU - Huang, Xiaowei
AU - Xiao, Jimin
AU - Ma, Fei
AU - Ouyang, Renrong
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2025/1
Y1 - 2025/1
N2 - Weakly Supervised Semantic Segmentation (WSSS), using only image-level labels, has garnered significant attention due to its cost-effectiveness. Typically, the framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based on Vision Transformers (ViT) have demonstrated superior capabilities in generating reliable pseudo-labels, particularly in recognizing complete object regions. However, current ViT-based approaches have some limitations in the use of patch embeddings, being prone to being dominated by certain abnormal patches, as well as many multi-stage methods being time-consuming and lengthy in training, thus lacking efficiency. Therefore, in this paper, we introduce a novel ViT-based WSSS method named Adaptive Patch Contrast (APC) that significantly enhances patch embedding learning for improved segmentation effectiveness. APC utilizes an Adaptive-K Pooling (AKP) layer to address the limitations of previous max pooling selection methods. Additionally, we propose a Patch Contrastive Learning (PCL) to enhance patch embeddings, thereby further improving the final results. We developed an end-to-end single-stage framework without CAM, which improved training efficiency. Experimental results demonstrate that our method performs exceptionally well on public datasets, outperforming other state-of-the-art WSSS methods with a shorter training time.
AB - Weakly Supervised Semantic Segmentation (WSSS), using only image-level labels, has garnered significant attention due to its cost-effectiveness. Typically, the framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based on Vision Transformers (ViT) have demonstrated superior capabilities in generating reliable pseudo-labels, particularly in recognizing complete object regions. However, current ViT-based approaches have some limitations in the use of patch embeddings, being prone to being dominated by certain abnormal patches, as well as many multi-stage methods being time-consuming and lengthy in training, thus lacking efficiency. Therefore, in this paper, we introduce a novel ViT-based WSSS method named Adaptive Patch Contrast (APC) that significantly enhances patch embedding learning for improved segmentation effectiveness. APC utilizes an Adaptive-K Pooling (AKP) layer to address the limitations of previous max pooling selection methods. Additionally, we propose a Patch Contrastive Learning (PCL) to enhance patch embeddings, thereby further improving the final results. We developed an end-to-end single-stage framework without CAM, which improved training efficiency. Experimental results demonstrate that our method performs exceptionally well on public datasets, outperforming other state-of-the-art WSSS methods with a shorter training time.
KW - Contrastive learning
KW - Semantic segmentation
KW - Vision Transformer
KW - Weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85209241469&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2024.109626
DO - 10.1016/j.engappai.2024.109626
M3 - Article
AN - SCOPUS:85209241469
SN - 0952-1976
VL - 139
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 109626
ER -