Abstract
Vision Transformers (ViTs) excel in semantic segmentation but demand significant computation, posing challenges for deployment on resource-constrained devices. Existing token pruning methods often overlook fundamental visual data characteristics. This study introduces ‘LVTP’, a progressive token pruning framework guided by multi-scale Tsallis entropy and low-level visual features with twice clustering. It integrates high-level semantics and basic visual attributes for precise segmentation. A novel dynamic scoring mechanism using multi-scale Tsallis entropy weighting overcomes limitations of traditional single-parameter entropy. The framework also incorporates low-level feature analysis to preserve critical edge information while optimizing computational cost. As a plug-and-play module, it requires no architectural changes or additional training. Evaluations across multiple datasets show 20%–45% computational reductions with negligible performance loss, outperforming existing methods in balancing cost and accuracy, especially in complex edge regions.
| Original language | English |
|---|---|
| Article number | 103579 |
| Journal | Journal of Systems Architecture |
| Volume | 168 |
| DOIs | |
| Publication status | Published - Nov 2025 |
Keywords
- Clustering
- Semantic segmentation
- Token pruning
- Tsallis entropy
- Vision transformers
Fingerprint
Dive into the research topics of 'Back to fundamentals: Low-level visual features guided progressive token pruning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver