CHAM: Action recognition using convolutional hierarchical attention model

Shiyang Yan, Jeremy S. Smith, Wenjin Lu, Bailing Zhang

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

7 Citations (Scopus)

Abstract

Recently, the soft attention mechanism, which was originally proposed in language processing, has been applied in computer vision tasks like image captioning. This paper presents improvements to the soft attention model by combining a con-volutional Long Short-Term Memory (LSTM) with a hierarchical system architecture to recognize action categories in videos. We call this model the Convolutional Hierarchical Attention Model (CHAM). The model applies a convolution-al operation inside the LSTM cell and an attention map generation process to recognize actions. The hierarchical architecture of this model is able to explicitly reason on multi-granularities of action categories. The proposed architecture achieved improved results on three publicly available datasets: the UCF sports dataset, the Olympic sports dataset and the HMDB51 dataset.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings
PublisherIEEE Computer Society
Pages3958-3962
Number of pages5
ISBN (Electronic)9781509021758
DOIs
Publication statusPublished - 2 Jul 2017
Event24th IEEE International Conference on Image Processing, ICIP 2017 - Beijing, China
Duration: 17 Sept 201720 Sept 2017

Publication series

NameProceedings - International Conference on Image Processing, ICIP
Volume2017-September
ISSN (Print)1522-4880

Conference

Conference24th IEEE International Conference on Image Processing, ICIP 2017
Country/TerritoryChina
CityBeijing
Period17/09/1720/09/17

Keywords

  • Action recognition
  • CNN
  • Convolutional LSTM
  • Hierarchical Architecture
  • Soft attention

Cite this