Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer

Nannan Li; Yaran Chen; Dongbin Zhao

doi:10.1109/IJCNN54540.2023.10191462

Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer

Nannan Li, Yaran Chen^*, Dongbin Zhao

^*Corresponding author for this work

Department of Intelligent Science

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

4 Citations (Scopus)

Abstract

Recently, Vision Transformer has demonstrated its impressive capability in image understanding. The multi-head self-attention mechanism is fundamental to its formidable performance. However, self-attention has the drawback of high computational effort, which makes the training of the model require powerful computational resources or more time. This paper designs a novel and efficient attention mechanism Dense Attention to overcome the above problem. Dense attention aims to focus on features from multiple views through a dense connection paradigm. Benefiting from the attention of comprehensive features, dense attention can i) remarkably strengthen the image representation of the model, and ii) partially replace the multi-head self-attention mechanism to allow model slimming. To verify the effectiveness of dense attention, we implement it in the prevalent Vision Transformer models, including non-pyramid architecture DeiT and pyramid architecture Swin Transformer. The experimental results on ImageNet classification show that dense attention indeed contributes to performance improvement, +1.8/1.3% for DeiT-T/s and +0.7/!!+1.2% for Swin-T/s, respectively. Dense attention also demonstrates its transferability on CIFAR10 and CIFAR100 recognition benchmarks with classification accuracy of 98.9% and 89.6% respectively. Furthermore, dense attention can weaken the performance sacrifice caused by the pruning in the number of heads. Code and pre-trained models will be available11https://github.com/koala719/Dense-ViT.

Original language	English
Title of host publication	IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781665488679
DOIs	https://doi.org/10.1109/IJCNN54540.2023.10191462
Publication status	Published - 2023
Event	2023 International Joint Conference on Neural Networks, IJCNN 2023 - Gold Coast, Australia Duration: 18 Jun 2023 → 23 Jun 2023

Publication series

Name	Proceedings of the International Joint Conference on Neural Networks
Volume	2023-June

Conference

Conference	2023 International Joint Conference on Neural Networks, IJCNN 2023
Country/Territory	Australia
City	Gold Coast
Period	18/06/23 → 23/06/23

Keywords

Dense connection
image classification
model slimming
pyramid
vision transformer

Access to Document

10.1109/IJCNN54540.2023.10191462

Cite this

Li, N., Chen, Y., & Zhao, D. (2023). Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer. In IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings (Proceedings of the International Joint Conference on Neural Networks; Vol. 2023-June). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN54540.2023.10191462

@inproceedings{e747fc59c6b742baa32ea5a499552bbb,

title = "Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer",

abstract = "Recently, Vision Transformer has demonstrated its impressive capability in image understanding. The multi-head self-attention mechanism is fundamental to its formidable performance. However, self-attention has the drawback of high computational effort, which makes the training of the model require powerful computational resources or more time. This paper designs a novel and efficient attention mechanism Dense Attention to overcome the above problem. Dense attention aims to focus on features from multiple views through a dense connection paradigm. Benefiting from the attention of comprehensive features, dense attention can i) remarkably strengthen the image representation of the model, and ii) partially replace the multi-head self-attention mechanism to allow model slimming. To verify the effectiveness of dense attention, we implement it in the prevalent Vision Transformer models, including non-pyramid architecture DeiT and pyramid architecture Swin Transformer. The experimental results on ImageNet classification show that dense attention indeed contributes to performance improvement, +1.8/1.3% for DeiT-T/s and +0.7/!!+1.2% for Swin-T/s, respectively. Dense attention also demonstrates its transferability on CIFAR10 and CIFAR100 recognition benchmarks with classification accuracy of 98.9% and 89.6% respectively. Furthermore, dense attention can weaken the performance sacrifice caused by the pruning in the number of heads. Code and pre-trained models will be available11https://github.com/koala719/Dense-ViT.",

keywords = "Dense connection, image classification, model slimming, pyramid, vision transformer",

author = "Nannan Li and Yaran Chen and Dongbin Zhao",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 International Joint Conference on Neural Networks, IJCNN 2023 ; Conference date: 18-06-2023 Through 23-06-2023",

year = "2023",

doi = "10.1109/IJCNN54540.2023.10191462",

language = "English",

series = "Proceedings of the International Joint Conference on Neural Networks",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings",

}

Li, N, Chen, Y & Zhao, D 2023, Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer. in IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings. Proceedings of the International Joint Conference on Neural Networks, vol. 2023-June, Institute of Electrical and Electronics Engineers Inc., 2023 International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, 18/06/23. https://doi.org/10.1109/IJCNN54540.2023.10191462

Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer. / Li, Nannan; Chen, Yaran; Zhao, Dongbin.
IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. (Proceedings of the International Joint Conference on Neural Networks; Vol. 2023-June).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Dense Attention

T2 - 2023 International Joint Conference on Neural Networks, IJCNN 2023

AU - Li, Nannan

AU - Chen, Yaran

AU - Zhao, Dongbin

PY - 2023

Y1 - 2023

N2 - Recently, Vision Transformer has demonstrated its impressive capability in image understanding. The multi-head self-attention mechanism is fundamental to its formidable performance. However, self-attention has the drawback of high computational effort, which makes the training of the model require powerful computational resources or more time. This paper designs a novel and efficient attention mechanism Dense Attention to overcome the above problem. Dense attention aims to focus on features from multiple views through a dense connection paradigm. Benefiting from the attention of comprehensive features, dense attention can i) remarkably strengthen the image representation of the model, and ii) partially replace the multi-head self-attention mechanism to allow model slimming. To verify the effectiveness of dense attention, we implement it in the prevalent Vision Transformer models, including non-pyramid architecture DeiT and pyramid architecture Swin Transformer. The experimental results on ImageNet classification show that dense attention indeed contributes to performance improvement, +1.8/1.3% for DeiT-T/s and +0.7/!!+1.2% for Swin-T/s, respectively. Dense attention also demonstrates its transferability on CIFAR10 and CIFAR100 recognition benchmarks with classification accuracy of 98.9% and 89.6% respectively. Furthermore, dense attention can weaken the performance sacrifice caused by the pruning in the number of heads. Code and pre-trained models will be available11https://github.com/koala719/Dense-ViT.

AB - Recently, Vision Transformer has demonstrated its impressive capability in image understanding. The multi-head self-attention mechanism is fundamental to its formidable performance. However, self-attention has the drawback of high computational effort, which makes the training of the model require powerful computational resources or more time. This paper designs a novel and efficient attention mechanism Dense Attention to overcome the above problem. Dense attention aims to focus on features from multiple views through a dense connection paradigm. Benefiting from the attention of comprehensive features, dense attention can i) remarkably strengthen the image representation of the model, and ii) partially replace the multi-head self-attention mechanism to allow model slimming. To verify the effectiveness of dense attention, we implement it in the prevalent Vision Transformer models, including non-pyramid architecture DeiT and pyramid architecture Swin Transformer. The experimental results on ImageNet classification show that dense attention indeed contributes to performance improvement, +1.8/1.3% for DeiT-T/s and +0.7/!!+1.2% for Swin-T/s, respectively. Dense attention also demonstrates its transferability on CIFAR10 and CIFAR100 recognition benchmarks with classification accuracy of 98.9% and 89.6% respectively. Furthermore, dense attention can weaken the performance sacrifice caused by the pruning in the number of heads. Code and pre-trained models will be available11https://github.com/koala719/Dense-ViT.

KW - Dense connection

KW - image classification

KW - model slimming

KW - pyramid

KW - vision transformer

UR - http://www.scopus.com/inward/record.url?scp=85168929993&partnerID=8YFLogxK

U2 - 10.1109/IJCNN54540.2023.10191462

DO - 10.1109/IJCNN54540.2023.10191462

M3 - Conference Proceeding

AN - SCOPUS:85168929993

T3 - Proceedings of the International Joint Conference on Neural Networks

BT - IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 18 June 2023 through 23 June 2023

ER -

Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this