MAN and CAT: mix attention to nn and concatenate attention to YOLO

Runwei Guan, Ka Lok Man, Haocheng Zhao, Ruixiao Zhang, Shanliang Yao, Jeremy Smith, Eng Gee Lim, Yutao Yue*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

CNNs have achieved remarkable image classification and object detection results over the past few years. Due to the locality of the convolution operation, although CNNs can extract rich features of the object itself, they can hardly obtain global context in images. It means the CNN-based network is not a good candidate for detecting objects by utilizing the information of the nearby objects, especially when the partially obscured object is hard to detect. ViTs can get a rich context and dramatically improve the prediction in complex scenes with multi-head self-attention. However, it suffers from long inference time and huge parameters, which leads ViT-based detection network that is hardly be deployed in the real-time detection system. In this paper, firstly, we design a novel plug-and-play attention module called mix attention (MA). MA combines channel, spatial and global contextual attention together. It enhances the feature representation of individuals and the correlation between multiple individuals. Secondly, we propose a backbone network based on mix attention called MANet. MANet-Base achieves the state-of-the-art performances on ImageNet and CIFAR. Last but not least, we propose a lightweight object detection network called CAT-YOLO, where we make a trade-off between precision and speed. It achieves the AP of 25.7% on COCO 2017 test-dev with only 9.17 million parameters, making it possible to deploy models containing ViT on hardware and ensure real-time detection. CAT-YOLO could better detect obscured objects than other state-of-the-art lightweight models.

Original languageEnglish
Pages (from-to)2108-2136
Number of pages29
JournalJournal of Supercomputing
Volume79
Issue number2
DOIs
Publication statusPublished - 11 Feb 2023

Keywords

  • Attention mechanism
  • Lightweight NN
  • Object detection
  • Object recognition
  • Plug-and-play NN

Fingerprint

Dive into the research topics of 'MAN and CAT: mix attention to nn and concatenate attention to YOLO'. Together they form a unique fingerprint.

Cite this