TY - JOUR
T1 - Adaptive search for broad attention based vision transformers
AU - Li, Nannan
AU - Chen, Yaran
AU - Zhao, Dongbin
N1 - Publisher Copyright:
© 2024
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation recently. Frustratingly, the manual design of efficient architectures for ViTs can be laborious, often involving repetitive trial and error processes. Furthermore, the exploration of lightweight ViTs remains limited, resulting in inferior performance compared to convolutional neural networks. To tackle these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which automates the design of efficient ViT architectures by utilizing the broad search space and an adaptive evolutionary algorithm. The broad search space facilitates the exploration of a novel connection paradigm, enabling more comprehensive integration of attention information to improve ViT performance. Additionally, an efficient adaptive evolutionary algorithm is developed to efficiently explore architectures by dynamically learning the probability distribution of candidate operators. Our experimental results demonstrate that the adaptive evolution in ASB efficiently learns excellent lightweight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is validated across several visual tasks. For instance, on ImageNet classification, the searched model attains a performance of 77.8% with 6.5M parameters and outperforms state-of-the-art models, including EfficientNet and EfficientViT networks. On mobile COCO panoptic segmentation, our approach delivers 43.7% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU. The code and pre-trained models will be available soon in ASB-Code.
AB - Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation recently. Frustratingly, the manual design of efficient architectures for ViTs can be laborious, often involving repetitive trial and error processes. Furthermore, the exploration of lightweight ViTs remains limited, resulting in inferior performance compared to convolutional neural networks. To tackle these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which automates the design of efficient ViT architectures by utilizing the broad search space and an adaptive evolutionary algorithm. The broad search space facilitates the exploration of a novel connection paradigm, enabling more comprehensive integration of attention information to improve ViT performance. Additionally, an efficient adaptive evolutionary algorithm is developed to efficiently explore architectures by dynamically learning the probability distribution of candidate operators. Our experimental results demonstrate that the adaptive evolution in ASB efficiently learns excellent lightweight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is validated across several visual tasks. For instance, on ImageNet classification, the searched model attains a performance of 77.8% with 6.5M parameters and outperforms state-of-the-art models, including EfficientNet and EfficientViT networks. On mobile COCO panoptic segmentation, our approach delivers 43.7% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU. The code and pre-trained models will be available soon in ASB-Code.
KW - Adaptive architecture search
KW - Broad learning
KW - Broad search space
KW - Image classification
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85205865629&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2024.128696
DO - 10.1016/j.neucom.2024.128696
M3 - Article
AN - SCOPUS:85205865629
SN - 0925-2312
VL - 611
JO - Neurocomputing
JF - Neurocomputing
M1 - 128696
ER -