TY - GEN
T1 - Top-K Pooling with Patch Contrastive Learning for Weakly-Supervised Semantic Segmentation
AU - Wu, Wangyu
AU - Dai, Tianhong
AU - Huang, Xiaowei
AU - Ma, Fei
AU - Xiao, Jimin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to cost-effectiveness. Recently, Vision Transformer (ViT) based methods without class activation map (CAM) have shown greater capability in generating reliable pseudo labels than previous methods using CAM. However, the current ViT-based methods utilize max pooling to select the patch with the highest prediction score to map the patch-level classification to the image-level one, which may affect the quality of pseudo labels due to the inaccurate classification of the patches. In this paper, we introduce a novel ViT-based WSSS method named top-K pooling with patch contrastive learning (TKP-PCL), which employs a top-K pooling layer to alleviate the limitations of previous max pooling selection. A patch contrastive error (PCE) is also proposed to enhance the patch embeddings to further improve the final results. The experimental results show that our approach is very efficient and outperforms other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 dataset.
AB - Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to cost-effectiveness. Recently, Vision Transformer (ViT) based methods without class activation map (CAM) have shown greater capability in generating reliable pseudo labels than previous methods using CAM. However, the current ViT-based methods utilize max pooling to select the patch with the highest prediction score to map the patch-level classification to the image-level one, which may affect the quality of pseudo labels due to the inaccurate classification of the patches. In this paper, we introduce a novel ViT-based WSSS method named top-K pooling with patch contrastive learning (TKP-PCL), which employs a top-K pooling layer to alleviate the limitations of previous max pooling selection. A patch contrastive error (PCE) is also proposed to enhance the patch embeddings to further improve the final results. The experimental results show that our approach is very efficient and outperforms other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 dataset.
UR - http://www.scopus.com/inward/record.url?scp=85217851284&partnerID=8YFLogxK
U2 - 10.1109/SMC54092.2024.10831685
DO - 10.1109/SMC54092.2024.10831685
M3 - Conference Proceeding
AN - SCOPUS:85217851284
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 5270
EP - 5275
BT - 2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024
Y2 - 6 October 2024 through 10 October 2024
ER -