Skip to main navigation Skip to search Skip to main content

Adaptive search for broad attention based vision transformers

  • Nannan Li
  • , Yaran Chen*
  • , Dongbin Zhao
  • *Corresponding author for this work
  • CAS - Institute of Automation
  • University of Chinese Academy of Sciences
  • Tsinghua National Laboratory for Information Science and Technology

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation recently. Frustratingly, the manual design of efficient architectures for ViTs can be laborious, often involving repetitive trial and error processes. Furthermore, the exploration of lightweight ViTs remains limited, resulting in inferior performance compared to convolutional neural networks. To tackle these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which automates the design of efficient ViT architectures by utilizing the broad search space and an adaptive evolutionary algorithm. The broad search space facilitates the exploration of a novel connection paradigm, enabling more comprehensive integration of attention information to improve ViT performance. Additionally, an efficient adaptive evolutionary algorithm is developed to efficiently explore architectures by dynamically learning the probability distribution of candidate operators. Our experimental results demonstrate that the adaptive evolution in ASB efficiently learns excellent lightweight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is validated across several visual tasks. For instance, on ImageNet classification, the searched model attains a performance of 77.8% with 6.5M parameters and outperforms state-of-the-art models, including EfficientNet and EfficientViT networks. On mobile COCO panoptic segmentation, our approach delivers 43.7% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU. The code and pre-trained models will be available soon in ASB-Code.

Original languageEnglish
Article number128696
JournalNeurocomputing
Volume611
DOIs
Publication statusPublished - 1 Jan 2025

Keywords

  • Adaptive architecture search
  • Broad learning
  • Broad search space
  • Image classification
  • Vision transformer

Fingerprint

Dive into the research topics of 'Adaptive search for broad attention based vision transformers'. Together they form a unique fingerprint.

Cite this