TY - JOUR
T1 - HAREDNet
T2 - A deep learning based architecture for autonomous video surveillance by recognizing human actions
AU - Nasir, Inzamam Mashood
AU - Raza, Mudassar
AU - Shah, Jamal Hussain
AU - Wang, Shui Hua
AU - Tariq, Usman
AU - Khan, Muhammad Attique
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/4
Y1 - 2022/4
N2 - Human Action Recognition (HAR) is still considered as a significant research area due to its emerging real-time applications like video surveillance, automated surveillance, real-time tracking and resecue missions. HAR domain still have gaps to cover, i.e., random changes in human variations, clothes, illumination, and backgrounds. Different camera settings, viewpoints and inter-class similarities have increased the complexity of this domain. The above-mentioned challenges in uncontrolled environment have ultimately reduced the performances of many well-designed models. The primary objective of this research is to propose and design an automated recognition system by overcoming these afore-mentioned issues. Redundant features and excessive computational time for the training and prediction process has also been a noteworthy problem. In this article, a hybrid recognition technique called HAREDNet is proposed, which has a) Encoder-Decoder Network (EDNet) to extract deep features; b) improved Scale-Invariant Feature Transform (iSIFT), improved Gabor (iGabor) and Local Maximal Occurrence (LOMO) techniques to extract local features; c) Cross-view Quadratic Discriminant Analysis (CvQDA) algorithm to reduce the feature redundancy; and d) weighted fusion strategy to merge properties of different essential features. The proposed technique is evaluated on three (3) publicly available datasets, including NTU RGB+D, HMDB51, and UCF-101, and achieved average recognition accuracy of 97.45%, 80.58%, and 97.48%, respectively, which is better than previously proposed methods.
AB - Human Action Recognition (HAR) is still considered as a significant research area due to its emerging real-time applications like video surveillance, automated surveillance, real-time tracking and resecue missions. HAR domain still have gaps to cover, i.e., random changes in human variations, clothes, illumination, and backgrounds. Different camera settings, viewpoints and inter-class similarities have increased the complexity of this domain. The above-mentioned challenges in uncontrolled environment have ultimately reduced the performances of many well-designed models. The primary objective of this research is to propose and design an automated recognition system by overcoming these afore-mentioned issues. Redundant features and excessive computational time for the training and prediction process has also been a noteworthy problem. In this article, a hybrid recognition technique called HAREDNet is proposed, which has a) Encoder-Decoder Network (EDNet) to extract deep features; b) improved Scale-Invariant Feature Transform (iSIFT), improved Gabor (iGabor) and Local Maximal Occurrence (LOMO) techniques to extract local features; c) Cross-view Quadratic Discriminant Analysis (CvQDA) algorithm to reduce the feature redundancy; and d) weighted fusion strategy to merge properties of different essential features. The proposed technique is evaluated on three (3) publicly available datasets, including NTU RGB+D, HMDB51, and UCF-101, and achieved average recognition accuracy of 97.45%, 80.58%, and 97.48%, respectively, which is better than previously proposed methods.
KW - CvQDA
KW - Deep Convolutional Neural Network
KW - Encoder-Decoder CNN architecture
KW - Human Action Recognition
KW - Weighted fusion
UR - http://www.scopus.com/inward/record.url?scp=85124593088&partnerID=8YFLogxK
U2 - 10.1016/j.compeleceng.2022.107805
DO - 10.1016/j.compeleceng.2022.107805
M3 - Article
AN - SCOPUS:85124593088
SN - 0045-7906
VL - 99
JO - Computers and Electrical Engineering
JF - Computers and Electrical Engineering
M1 - 107805
ER -