Hybrid handcrafted and learned feature framework for human action recognition

Chaolong Zhang, Yuanping Xu*, Zhijie Xu, Jian Huang, Jun Lu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)

Abstract

Recognising human actions in video is a challenging task in real-world. Dense trajectory (DT) offers accurate recording of motions over time that is rich in dynamic information. However, DT models lack the mechanism to distinguish dominant motions from secondary ones over separable frequency bands and directions. By contrast, deep learning-based methods are promising over the challenge though still suffering from limited capacity in handling complex temporal information, not mentioning huge datasets needed to guide the training. To take the advantage of semantical meaningful and “handcrafted” video features through feature engineering, this study integrates the discrete wavelet transform (DWT) technique into the DT model for gaining more descriptive human action features. Through exploring the pre-trained dual-stream CNN-RNN models, learned features can be integrated with the handcrafted ones to satisfy stringent analytical requirements within the spatial-temporal domain. This hybrid feature framework generates efficient Fisher Vectors through a novel Bag of Temporal Features scheme and is capable of encoding video events whilst speeding up action recognition for real-world applications. Evaluation of the design has shown superior recognition performance over existing benchmark systems. It has also demonstrated promising applicability and extensibility for solving challenging real-world human action recognition problems.

Original languageEnglish
JournalApplied Intelligence
DOIs
Publication statusAccepted/In press - 2022
Externally publishedYes

Keywords

  • Action recognition
  • Bag-of-temporal features
  • Dense trajectories
  • Motion stream
  • Visual stream

Fingerprint

Dive into the research topics of 'Hybrid handcrafted and learned feature framework for human action recognition'. Together they form a unique fingerprint.

Cite this