Action recognition in videos with temporal segments fusions

Yuanye Fang, Rui Zhang, Qiu Feng Wang, Kaizhu Huang*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Deep Convolutional Neural Networks (CNNs) have achieved great success in object recognition. However, they are difficult to capture the long-range temporal information, which plays an important role for action recognition in videos. To overcome this issue, a two-stream architecture including spatial and temporal segments based CNNs is widely used recently. However, the relationship among the segments is not sufficiently investigated. In this paper, we proposed to combine multiple segments by a fully connected layer in a deep CNN model for the whole action video. Moreover, the four streams (i.e., RGB, RGB differences, optical flow, and warped optical flow) are carefully integrated with a linear combination, and the weights are optimized on the validation datasets. We evaluate the recognition accuracy of the proposed method on two benchmark datasets of UCF101 and HMDB51. The extensive experimental results demonstrate encouraging results of our proposed method. Specifically, the proposed method improves the accuracy of action recognition in videos obviously (e.g., compared with the baseline, the accuracy is improved from 94.20% to 97.30% and from 69.40% to 77.99% on the dataset UCF101 and HMDB51, respectively). Furthermore, the proposed method can obtain the competitive accuracy to the state-of-the-art method of the 3D convolutional operation, but with much fewer parameters.

Original languageEnglish
Title of host publicationAdvances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings
EditorsJinchang Ren, Amir Hussain, Huimin Zhao, Jun Cai, Rongjun Chen, Yinyin Xiao, Kaizhu Huang, Jiangbin Zheng
PublisherSpringer
Pages244-253
Number of pages10
ISBN (Print)9783030394301
DOIs
Publication statusPublished - 2020
Event10th International Conference on Brain Inspired Cognitive Systems, BICS 2019 - Guangzhou, China
Duration: 13 Jul 201914 Jul 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11691 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Brain Inspired Cognitive Systems, BICS 2019
Country/TerritoryChina
CityGuangzhou
Period13/07/1914/07/19

Keywords

  • Action recognition
  • Convolutional Neural Networks
  • Segments fusion
  • Temporal segments models

Fingerprint

Dive into the research topics of 'Action recognition in videos with temporal segments fusions'. Together they form a unique fingerprint.

Cite this