CHAM: Action recognition using convolutional hierarchical attention model

Shiyang Yan; Jeremy S. Smith; Wenjin Lu; Bailing Zhang

doi:10.1109/ICIP.2017.8297025

CHAM: Action recognition using convolutional hierarchical attention model

Shiyang Yan, Jeremy S. Smith, Wenjin Lu, Bailing Zhang

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

7 Citations (Scopus)

Abstract

Recently, the soft attention mechanism, which was originally proposed in language processing, has been applied in computer vision tasks like image captioning. This paper presents improvements to the soft attention model by combining a con-volutional Long Short-Term Memory (LSTM) with a hierarchical system architecture to recognize action categories in videos. We call this model the Convolutional Hierarchical Attention Model (CHAM). The model applies a convolution-al operation inside the LSTM cell and an attention map generation process to recognize actions. The hierarchical architecture of this model is able to explicitly reason on multi-granularities of action categories. The proposed architecture achieved improved results on three publicly available datasets: the UCF sports dataset, the Olympic sports dataset and the HMDB51 dataset.

Original language	English
Title of host publication	2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings
Publisher	IEEE Computer Society
Pages	3958-3962
Number of pages	5
ISBN (Electronic)	9781509021758
DOIs	https://doi.org/10.1109/ICIP.2017.8297025
Publication status	Published - 2 Jul 2017
Event	24th IEEE International Conference on Image Processing, ICIP 2017 - Beijing, China Duration: 17 Sept 2017 → 20 Sept 2017

Publication series

Name	Proceedings - International Conference on Image Processing, ICIP
Volume	2017-September
ISSN (Print)	1522-4880

Conference

Conference	24th IEEE International Conference on Image Processing, ICIP 2017
Country/Territory	China
City	Beijing
Period	17/09/17 → 20/09/17

Keywords

Action recognition
CNN
Convolutional LSTM
Hierarchical Architecture
Soft attention

Access to Document

10.1109/ICIP.2017.8297025

Cite this

Yan, S., Smith, J. S., Lu, W., & Zhang, B. (2017). CHAM: Action recognition using convolutional hierarchical attention model. In 2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings (pp. 3958-3962). (Proceedings - International Conference on Image Processing, ICIP; Vol. 2017-September). IEEE Computer Society. https://doi.org/10.1109/ICIP.2017.8297025

@inproceedings{14b85f1d89b14c60a8a695d8bb20c601,

title = "CHAM: Action recognition using convolutional hierarchical attention model",

abstract = "Recently, the soft attention mechanism, which was originally proposed in language processing, has been applied in computer vision tasks like image captioning. This paper presents improvements to the soft attention model by combining a con-volutional Long Short-Term Memory (LSTM) with a hierarchical system architecture to recognize action categories in videos. We call this model the Convolutional Hierarchical Attention Model (CHAM). The model applies a convolution-al operation inside the LSTM cell and an attention map generation process to recognize actions. The hierarchical architecture of this model is able to explicitly reason on multi-granularities of action categories. The proposed architecture achieved improved results on three publicly available datasets: the UCF sports dataset, the Olympic sports dataset and the HMDB51 dataset.",

keywords = "Action recognition, CNN, Convolutional LSTM, Hierarchical Architecture, Soft attention",

author = "Shiyang Yan and Smith, {Jeremy S.} and Wenjin Lu and Bailing Zhang",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 24th IEEE International Conference on Image Processing, ICIP 2017 ; Conference date: 17-09-2017 Through 20-09-2017",

year = "2017",

month = jul,

day = "2",

doi = "10.1109/ICIP.2017.8297025",

language = "English",

series = "Proceedings - International Conference on Image Processing, ICIP",

publisher = "IEEE Computer Society",

pages = "3958--3962",

booktitle = "2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings",

}

Yan, S, Smith, JS, Lu, W & Zhang, B 2017, CHAM: Action recognition using convolutional hierarchical attention model. in 2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings. Proceedings - International Conference on Image Processing, ICIP, vol. 2017-September, IEEE Computer Society, pp. 3958-3962, 24th IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, 17/09/17. https://doi.org/10.1109/ICIP.2017.8297025

CHAM: Action recognition using convolutional hierarchical attention model. / Yan, Shiyang; Smith, Jeremy S.; Lu, Wenjin et al.
2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings. IEEE Computer Society, 2017. p. 3958-3962 (Proceedings - International Conference on Image Processing, ICIP; Vol. 2017-September).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - CHAM

T2 - 24th IEEE International Conference on Image Processing, ICIP 2017

AU - Yan, Shiyang

AU - Smith, Jeremy S.

AU - Lu, Wenjin

AU - Zhang, Bailing

PY - 2017/7/2

Y1 - 2017/7/2

N2 - Recently, the soft attention mechanism, which was originally proposed in language processing, has been applied in computer vision tasks like image captioning. This paper presents improvements to the soft attention model by combining a con-volutional Long Short-Term Memory (LSTM) with a hierarchical system architecture to recognize action categories in videos. We call this model the Convolutional Hierarchical Attention Model (CHAM). The model applies a convolution-al operation inside the LSTM cell and an attention map generation process to recognize actions. The hierarchical architecture of this model is able to explicitly reason on multi-granularities of action categories. The proposed architecture achieved improved results on three publicly available datasets: the UCF sports dataset, the Olympic sports dataset and the HMDB51 dataset.

AB - Recently, the soft attention mechanism, which was originally proposed in language processing, has been applied in computer vision tasks like image captioning. This paper presents improvements to the soft attention model by combining a con-volutional Long Short-Term Memory (LSTM) with a hierarchical system architecture to recognize action categories in videos. We call this model the Convolutional Hierarchical Attention Model (CHAM). The model applies a convolution-al operation inside the LSTM cell and an attention map generation process to recognize actions. The hierarchical architecture of this model is able to explicitly reason on multi-granularities of action categories. The proposed architecture achieved improved results on three publicly available datasets: the UCF sports dataset, the Olympic sports dataset and the HMDB51 dataset.

KW - Action recognition

KW - CNN

KW - Convolutional LSTM

KW - Hierarchical Architecture

KW - Soft attention

UR - http://www.scopus.com/inward/record.url?scp=85045324617&partnerID=8YFLogxK

U2 - 10.1109/ICIP.2017.8297025

DO - 10.1109/ICIP.2017.8297025

M3 - Conference Proceeding

AN - SCOPUS:85045324617

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 3958

EP - 3962

BT - 2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings

PB - IEEE Computer Society

Y2 - 17 September 2017 through 20 September 2017

ER -

CHAM: Action recognition using convolutional hierarchical attention model

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this