Hybrid handcrafted and learned feature framework for human action recognition

Chaolong Zhang; Yuanping Xu; Zhijie Xu; Jian Huang; Jun Lu

doi:10.1007/s10489-021-03068-w

Hybrid handcrafted and learned feature framework for human action recognition

Chaolong Zhang, Yuanping Xu^*, Zhijie Xu, Jian Huang, Jun Lu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

20 Citations (Scopus)

Abstract

Recognising human actions in video is a challenging task in real-world. Dense trajectory (DT) offers accurate recording of motions over time that is rich in dynamic information. However, DT models lack the mechanism to distinguish dominant motions from secondary ones over separable frequency bands and directions. By contrast, deep learning-based methods are promising over the challenge though still suffering from limited capacity in handling complex temporal information, not mentioning huge datasets needed to guide the training. To take the advantage of semantical meaningful and “handcrafted” video features through feature engineering, this study integrates the discrete wavelet transform (DWT) technique into the DT model for gaining more descriptive human action features. Through exploring the pre-trained dual-stream CNN-RNN models, learned features can be integrated with the handcrafted ones to satisfy stringent analytical requirements within the spatial-temporal domain. This hybrid feature framework generates efficient Fisher Vectors through a novel Bag of Temporal Features scheme and is capable of encoding video events whilst speeding up action recognition for real-world applications. Evaluation of the design has shown superior recognition performance over existing benchmark systems. It has also demonstrated promising applicability and extensibility for solving challenging real-world human action recognition problems.

Original language	English
Journal	Applied Intelligence
DOIs	https://doi.org/10.1007/s10489-021-03068-w
Publication status	Accepted/In press - 2022
Externally published	Yes

Keywords

Action recognition
Bag-of-temporal features
Dense trajectories
Motion stream
Visual stream

Access to Document

10.1007/s10489-021-03068-w

Cite this

@article{84161f5359c94418935b55dc18b57649,

title = "Hybrid handcrafted and learned feature framework for human action recognition",

abstract = "Recognising human actions in video is a challenging task in real-world. Dense trajectory (DT) offers accurate recording of motions over time that is rich in dynamic information. However, DT models lack the mechanism to distinguish dominant motions from secondary ones over separable frequency bands and directions. By contrast, deep learning-based methods are promising over the challenge though still suffering from limited capacity in handling complex temporal information, not mentioning huge datasets needed to guide the training. To take the advantage of semantical meaningful and “handcrafted” video features through feature engineering, this study integrates the discrete wavelet transform (DWT) technique into the DT model for gaining more descriptive human action features. Through exploring the pre-trained dual-stream CNN-RNN models, learned features can be integrated with the handcrafted ones to satisfy stringent analytical requirements within the spatial-temporal domain. This hybrid feature framework generates efficient Fisher Vectors through a novel Bag of Temporal Features scheme and is capable of encoding video events whilst speeding up action recognition for real-world applications. Evaluation of the design has shown superior recognition performance over existing benchmark systems. It has also demonstrated promising applicability and extensibility for solving challenging real-world human action recognition problems.",

keywords = "Action recognition, Bag-of-temporal features, Dense trajectories, Motion stream, Visual stream",

author = "Chaolong Zhang and Yuanping Xu and Zhijie Xu and Jian Huang and Jun Lu",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2022",

doi = "10.1007/s10489-021-03068-w",

language = "English",

journal = "Applied Intelligence",

issn = "0924-669X",

}

TY - JOUR

T1 - Hybrid handcrafted and learned feature framework for human action recognition

AU - Zhang, Chaolong

AU - Xu, Yuanping

AU - Xu, Zhijie

AU - Huang, Jian

AU - Lu, Jun

PY - 2022

Y1 - 2022

N2 - Recognising human actions in video is a challenging task in real-world. Dense trajectory (DT) offers accurate recording of motions over time that is rich in dynamic information. However, DT models lack the mechanism to distinguish dominant motions from secondary ones over separable frequency bands and directions. By contrast, deep learning-based methods are promising over the challenge though still suffering from limited capacity in handling complex temporal information, not mentioning huge datasets needed to guide the training. To take the advantage of semantical meaningful and “handcrafted” video features through feature engineering, this study integrates the discrete wavelet transform (DWT) technique into the DT model for gaining more descriptive human action features. Through exploring the pre-trained dual-stream CNN-RNN models, learned features can be integrated with the handcrafted ones to satisfy stringent analytical requirements within the spatial-temporal domain. This hybrid feature framework generates efficient Fisher Vectors through a novel Bag of Temporal Features scheme and is capable of encoding video events whilst speeding up action recognition for real-world applications. Evaluation of the design has shown superior recognition performance over existing benchmark systems. It has also demonstrated promising applicability and extensibility for solving challenging real-world human action recognition problems.

AB - Recognising human actions in video is a challenging task in real-world. Dense trajectory (DT) offers accurate recording of motions over time that is rich in dynamic information. However, DT models lack the mechanism to distinguish dominant motions from secondary ones over separable frequency bands and directions. By contrast, deep learning-based methods are promising over the challenge though still suffering from limited capacity in handling complex temporal information, not mentioning huge datasets needed to guide the training. To take the advantage of semantical meaningful and “handcrafted” video features through feature engineering, this study integrates the discrete wavelet transform (DWT) technique into the DT model for gaining more descriptive human action features. Through exploring the pre-trained dual-stream CNN-RNN models, learned features can be integrated with the handcrafted ones to satisfy stringent analytical requirements within the spatial-temporal domain. This hybrid feature framework generates efficient Fisher Vectors through a novel Bag of Temporal Features scheme and is capable of encoding video events whilst speeding up action recognition for real-world applications. Evaluation of the design has shown superior recognition performance over existing benchmark systems. It has also demonstrated promising applicability and extensibility for solving challenging real-world human action recognition problems.

KW - Action recognition

KW - Bag-of-temporal features

KW - Dense trajectories

KW - Motion stream

KW - Visual stream

UR - http://www.scopus.com/inward/record.url?scp=85124525211&partnerID=8YFLogxK

U2 - 10.1007/s10489-021-03068-w

DO - 10.1007/s10489-021-03068-w

M3 - Article

AN - SCOPUS:85124525211

SN - 0924-669X

JO - Applied Intelligence

JF - Applied Intelligence

ER -

Hybrid handcrafted and learned feature framework for human action recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this