TY - GEN
T1 - Dense Attention
T2 - 2023 International Joint Conference on Neural Networks, IJCNN 2023
AU - Li, Nannan
AU - Chen, Yaran
AU - Zhao, Dongbin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Recently, Vision Transformer has demonstrated its impressive capability in image understanding. The multi-head self-attention mechanism is fundamental to its formidable performance. However, self-attention has the drawback of high computational effort, which makes the training of the model require powerful computational resources or more time. This paper designs a novel and efficient attention mechanism Dense Attention to overcome the above problem. Dense attention aims to focus on features from multiple views through a dense connection paradigm. Benefiting from the attention of comprehensive features, dense attention can i) remarkably strengthen the image representation of the model, and ii) partially replace the multi-head self-attention mechanism to allow model slimming. To verify the effectiveness of dense attention, we implement it in the prevalent Vision Transformer models, including non-pyramid architecture DeiT and pyramid architecture Swin Transformer. The experimental results on ImageNet classification show that dense attention indeed contributes to performance improvement, +1.8/1.3% for DeiT-T/s and +0.7/!!+1.2% for Swin-T/s, respectively. Dense attention also demonstrates its transferability on CIFAR10 and CIFAR100 recognition benchmarks with classification accuracy of 98.9% and 89.6% respectively. Furthermore, dense attention can weaken the performance sacrifice caused by the pruning in the number of heads. Code and pre-trained models will be available11https://github.com/koala719/Dense-ViT.
AB - Recently, Vision Transformer has demonstrated its impressive capability in image understanding. The multi-head self-attention mechanism is fundamental to its formidable performance. However, self-attention has the drawback of high computational effort, which makes the training of the model require powerful computational resources or more time. This paper designs a novel and efficient attention mechanism Dense Attention to overcome the above problem. Dense attention aims to focus on features from multiple views through a dense connection paradigm. Benefiting from the attention of comprehensive features, dense attention can i) remarkably strengthen the image representation of the model, and ii) partially replace the multi-head self-attention mechanism to allow model slimming. To verify the effectiveness of dense attention, we implement it in the prevalent Vision Transformer models, including non-pyramid architecture DeiT and pyramid architecture Swin Transformer. The experimental results on ImageNet classification show that dense attention indeed contributes to performance improvement, +1.8/1.3% for DeiT-T/s and +0.7/!!+1.2% for Swin-T/s, respectively. Dense attention also demonstrates its transferability on CIFAR10 and CIFAR100 recognition benchmarks with classification accuracy of 98.9% and 89.6% respectively. Furthermore, dense attention can weaken the performance sacrifice caused by the pruning in the number of heads. Code and pre-trained models will be available11https://github.com/koala719/Dense-ViT.
KW - Dense connection
KW - image classification
KW - model slimming
KW - pyramid
KW - vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85168929993&partnerID=8YFLogxK
U2 - 10.1109/IJCNN54540.2023.10191462
DO - 10.1109/IJCNN54540.2023.10191462
M3 - Conference Proceeding
AN - SCOPUS:85168929993
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 June 2023 through 23 June 2023
ER -