Multi-scale Attention Consistency for Multi-label Image Classification

Haotian Xu; Xiaobo Jin; Qiufeng Wang; Kaizhu Huang

doi:10.1007/978-3-030-63820-7_93

Multi-scale Attention Consistency for Multi-label Image Classification

Haotian Xu, Xiaobo Jin, Qiufeng Wang, Kaizhu Huang^*

^*Corresponding author for this work

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

2 Citations (Scopus)

Abstract

Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human’s visual perception consistency, researchers find out that convolutional neural network’s capacity of discernment can be further elevated via forcing the network to concentrate on certain area in the picture in accordance with the human natural visual perception. Attention heatmap, as a supplementary tool to reveal the essential region that the network chooses to focus on, has been developed and widely adopted by CNNs. Based on this regime of visual consistency, we propose a novel end-to-end trainable CNN architecture with multi-scale attention consistency. Specifically, our model takes an original picture and its flipped counterpart as inputs, and then send them into a single standard Resnet with additional attention-enhanced modules to generate a semantically strong attention heatmap. We also compute the distance between multi-scale attention heatmaps of these two pictures and take it as an additional loss to help the network achieve better performance. Our network shows superiority on the multi-label classification task and attains compelling results on the WIDER Attribute Dataset.

Original language	English
Title of host publication	International Conference on Neural Information Processing (ICONIP), 2020
Editors	Haiqin Yang, Kitsuchart Pasupa, Andrew Chi-Sing Leung, James T. Kwok, Jonathan H. Chan, Irwin King
Pages	815-823
Number of pages	9
DOIs	https://doi.org/10.1007/978-3-030-63820-7_93
Publication status	Published - 2020
Event	27th International Conference on Neural Information Processing, ICONIP 2020 - Bangkok, Thailand Duration: 18 Nov 2020 → 22 Nov 2020

Conference

Conference	27th International Conference on Neural Information Processing, ICONIP 2020
Country/Territory	Thailand
City	Bangkok
Period	18/11/20 → 22/11/20

Keywords

Attention
Consistency
Image classification
Multi-label learning

Access to Document

10.1007/978-3-030-63820-7_93

Cite this

@inproceedings{bc2cffa8b96b4d96943797fe26bbc636,

title = "Multi-scale Attention Consistency for Multi-label Image Classification",

abstract = "Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human{\textquoteright}s visual perception consistency, researchers find out that convolutional neural network{\textquoteright}s capacity of discernment can be further elevated via forcing the network to concentrate on certain area in the picture in accordance with the human natural visual perception. Attention heatmap, as a supplementary tool to reveal the essential region that the network chooses to focus on, has been developed and widely adopted by CNNs. Based on this regime of visual consistency, we propose a novel end-to-end trainable CNN architecture with multi-scale attention consistency. Specifically, our model takes an original picture and its flipped counterpart as inputs, and then send them into a single standard Resnet with additional attention-enhanced modules to generate a semantically strong attention heatmap. We also compute the distance between multi-scale attention heatmaps of these two pictures and take it as an additional loss to help the network achieve better performance. Our network shows superiority on the multi-label classification task and attains compelling results on the WIDER Attribute Dataset.",

keywords = "Attention, Consistency, Image classification, Multi-label learning",

author = "Haotian Xu and Xiaobo Jin and Qiufeng Wang and Kaizhu Huang",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature Switzerland AG.; 27th International Conference on Neural Information Processing, ICONIP 2020 ; Conference date: 18-11-2020 Through 22-11-2020",

year = "2020",

doi = "10.1007/978-3-030-63820-7_93",

language = "English",

isbn = "9783030638191",

pages = "815--823",

editor = "Haiqin Yang and Kitsuchart Pasupa and Leung, {Andrew Chi-Sing} and Kwok, {James T.} and Chan, {Jonathan H.} and Irwin King",

booktitle = "International Conference on Neural Information Processing (ICONIP), 2020",

}

Xu, H, Jin, X , Wang, Q & Huang, K 2020, Multi-scale Attention Consistency for Multi-label Image Classification. in H Yang, K Pasupa, AC-S Leung, JT Kwok, JH Chan & I King (eds), International Conference on Neural Information Processing (ICONIP), 2020. pp. 815-823, 27th International Conference on Neural Information Processing, ICONIP 2020, Bangkok, Thailand, 18/11/20. https://doi.org/10.1007/978-3-030-63820-7_93

Multi-scale Attention Consistency for Multi-label Image Classification. / Xu, Haotian; Jin, Xiaobo ; Wang, Qiufeng et al.
International Conference on Neural Information Processing (ICONIP), 2020. ed. / Haiqin Yang; Kitsuchart Pasupa; Andrew Chi-Sing Leung; James T. Kwok; Jonathan H. Chan; Irwin King. 2020. p. 815-823.

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Multi-scale Attention Consistency for Multi-label Image Classification

AU - Xu, Haotian

AU - Jin, Xiaobo

AU - Wang, Qiufeng

AU - Huang, Kaizhu

PY - 2020

Y1 - 2020

N2 - Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human’s visual perception consistency, researchers find out that convolutional neural network’s capacity of discernment can be further elevated via forcing the network to concentrate on certain area in the picture in accordance with the human natural visual perception. Attention heatmap, as a supplementary tool to reveal the essential region that the network chooses to focus on, has been developed and widely adopted by CNNs. Based on this regime of visual consistency, we propose a novel end-to-end trainable CNN architecture with multi-scale attention consistency. Specifically, our model takes an original picture and its flipped counterpart as inputs, and then send them into a single standard Resnet with additional attention-enhanced modules to generate a semantically strong attention heatmap. We also compute the distance between multi-scale attention heatmaps of these two pictures and take it as an additional loss to help the network achieve better performance. Our network shows superiority on the multi-label classification task and attains compelling results on the WIDER Attribute Dataset.

AB - Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human’s visual perception consistency, researchers find out that convolutional neural network’s capacity of discernment can be further elevated via forcing the network to concentrate on certain area in the picture in accordance with the human natural visual perception. Attention heatmap, as a supplementary tool to reveal the essential region that the network chooses to focus on, has been developed and widely adopted by CNNs. Based on this regime of visual consistency, we propose a novel end-to-end trainable CNN architecture with multi-scale attention consistency. Specifically, our model takes an original picture and its flipped counterpart as inputs, and then send them into a single standard Resnet with additional attention-enhanced modules to generate a semantically strong attention heatmap. We also compute the distance between multi-scale attention heatmaps of these two pictures and take it as an additional loss to help the network achieve better performance. Our network shows superiority on the multi-label classification task and attains compelling results on the WIDER Attribute Dataset.

KW - Attention

KW - Consistency

KW - Image classification

KW - Multi-label learning

UR - http://www.scopus.com/inward/record.url?scp=85097259383&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-63820-7_93

DO - 10.1007/978-3-030-63820-7_93

M3 - Conference Proceeding

AN - SCOPUS:85097259383

SN - 9783030638191

SP - 815

EP - 823

BT - International Conference on Neural Information Processing (ICONIP), 2020

A2 - Yang, Haiqin

A2 - Pasupa, Kitsuchart

A2 - Leung, Andrew Chi-Sing

A2 - Kwok, James T.

A2 - Chan, Jonathan H.

A2 - King, Irwin

T2 - 27th International Conference on Neural Information Processing, ICONIP 2020

Y2 - 18 November 2020 through 22 November 2020

ER -

Multi-scale Attention Consistency for Multi-label Image Classification

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this