FTCFormer: Fuzzy Token Clustering Transformer for Image Classification

  • Muyi Bao
  • , Changyu Zeng
  • , Yifan Wang
  • , Zhengni Yang
  • , Zimu Wang
  • , Guangliang Cheng
  • , Jun Qi
  • , Wei Wang*
  • *Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Transformer-based deep neural networks have achieved remarkable success across various computer vision tasks, largely attributed to their long-range self-attention mechanism and scalability. However, most transformer architectures embed images into uniform, grid-based vision tokens, neglecting the underlying semantic meanings of image regions, resulting in suboptimal feature representations. To address this issue, we propose Fuzzy Token Clustering Transformer (FTCFormer), which incorporates a novel clustering-based downsampling module to dynamically generate vision tokens based on the semantic meanings instead of spatial positions. It allocates fewer tokens to less informative regions and more tokens to represent semantically important regions, regardless of their spatial adjacency or shape irregularity. To further enhance feature extraction and representation, we propose a Density Peak Clustering-Fuzzy K-Nearest Neighbor (DPC-FKNN) mechanism for clustering center determination, a Spatial Connectivity Score (SCS) for token assignment, and a channel-wise merging (Cmerge) strategy for token merging. Extensive experiments on 32 datasets across diverse domains validate the effectiveness of FTCFormer on image classification, showing consistent improvements over the TCFormer baseline, achieving gains of improving 1.43% on five fine-grained datasets, 1.09% on six natural image datasets, 0.97% on three medical datasets and 0.55% on four remote sensing datasets. The code is available at: https://github.com/BaoBao0926/FTCFormer/tree/main.

Original languageEnglish
Title of host publicationECAI 2025 - 28th European Conference on Artificial Intelligence, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Proceedings
EditorsInes Lynce, Nello Murano, Mauro Vallati, Serena Villata, Federico Chesani, Michela Milano, Andrea Omicini, Mehdi Dastani
PublisherIOS Press BV
Pages114-121
Number of pages8
ISBN (Electronic)9781643686318
DOIs
Publication statusPublished - 21 Oct 2025
Event28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Bologna, Italy
Duration: 25 Oct 202530 Oct 2025

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume413
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025
Country/TerritoryItaly
CityBologna
Period25/10/2530/10/25

Cite this