HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions

Inzamam Mashood Nasir; Mudassar Raza; Jamal Hussain Shah; Shui Hua Wang; Usman Tariq; Muhammad Attique Khan

doi:10.1016/j.compeleceng.2022.107805

HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions

Inzamam Mashood Nasir^*, Mudassar Raza, Jamal Hussain Shah, Shui Hua Wang, Usman Tariq, Muhammad Attique Khan

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

38 Citations (Scopus)

Abstract

Human Action Recognition (HAR) is still considered as a significant research area due to its emerging real-time applications like video surveillance, automated surveillance, real-time tracking and resecue missions. HAR domain still have gaps to cover, i.e., random changes in human variations, clothes, illumination, and backgrounds. Different camera settings, viewpoints and inter-class similarities have increased the complexity of this domain. The above-mentioned challenges in uncontrolled environment have ultimately reduced the performances of many well-designed models. The primary objective of this research is to propose and design an automated recognition system by overcoming these afore-mentioned issues. Redundant features and excessive computational time for the training and prediction process has also been a noteworthy problem. In this article, a hybrid recognition technique called HAREDNet is proposed, which has a) Encoder-Decoder Network (EDNet) to extract deep features; b) improved Scale-Invariant Feature Transform (iSIFT), improved Gabor (iGabor) and Local Maximal Occurrence (LOMO) techniques to extract local features; c) Cross-view Quadratic Discriminant Analysis (CvQDA) algorithm to reduce the feature redundancy; and d) weighted fusion strategy to merge properties of different essential features. The proposed technique is evaluated on three (3) publicly available datasets, including NTU RGB+D, HMDB51, and UCF-101, and achieved average recognition accuracy of 97.45%, 80.58%, and 97.48%, respectively, which is better than previously proposed methods.

Original language	English
Article number	107805
Journal	Computers and Electrical Engineering
Volume	99
DOIs	https://doi.org/10.1016/j.compeleceng.2022.107805
Publication status	Published - Apr 2022
Externally published	Yes

Keywords

CvQDA
Deep Convolutional Neural Network
Encoder-Decoder CNN architecture
Human Action Recognition
Weighted fusion

Access to Document

10.1016/j.compeleceng.2022.107805

Cite this

@article{338b7f8fabd04362936527811e6cee8c,

title = "HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions",

abstract = "Human Action Recognition (HAR) is still considered as a significant research area due to its emerging real-time applications like video surveillance, automated surveillance, real-time tracking and resecue missions. HAR domain still have gaps to cover, i.e., random changes in human variations, clothes, illumination, and backgrounds. Different camera settings, viewpoints and inter-class similarities have increased the complexity of this domain. The above-mentioned challenges in uncontrolled environment have ultimately reduced the performances of many well-designed models. The primary objective of this research is to propose and design an automated recognition system by overcoming these afore-mentioned issues. Redundant features and excessive computational time for the training and prediction process has also been a noteworthy problem. In this article, a hybrid recognition technique called HAREDNet is proposed, which has a) Encoder-Decoder Network (EDNet) to extract deep features; b) improved Scale-Invariant Feature Transform (iSIFT), improved Gabor (iGabor) and Local Maximal Occurrence (LOMO) techniques to extract local features; c) Cross-view Quadratic Discriminant Analysis (CvQDA) algorithm to reduce the feature redundancy; and d) weighted fusion strategy to merge properties of different essential features. The proposed technique is evaluated on three (3) publicly available datasets, including NTU RGB+D, HMDB51, and UCF-101, and achieved average recognition accuracy of 97.45%, 80.58%, and 97.48%, respectively, which is better than previously proposed methods.",

keywords = "CvQDA, Deep Convolutional Neural Network, Encoder-Decoder CNN architecture, Human Action Recognition, Weighted fusion",

author = "Nasir, {Inzamam Mashood} and Mudassar Raza and Shah, {Jamal Hussain} and Wang, {Shui Hua} and Usman Tariq and Khan, {Muhammad Attique}",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2022",

month = apr,

doi = "10.1016/j.compeleceng.2022.107805",

language = "English",

volume = "99",

journal = "Computers and Electrical Engineering",

issn = "0045-7906",

}

TY - JOUR

T1 - HAREDNet

T2 - A deep learning based architecture for autonomous video surveillance by recognizing human actions

AU - Nasir, Inzamam Mashood

AU - Raza, Mudassar

AU - Shah, Jamal Hussain

AU - Wang, Shui Hua

AU - Tariq, Usman

AU - Khan, Muhammad Attique

PY - 2022/4

Y1 - 2022/4

N2 - Human Action Recognition (HAR) is still considered as a significant research area due to its emerging real-time applications like video surveillance, automated surveillance, real-time tracking and resecue missions. HAR domain still have gaps to cover, i.e., random changes in human variations, clothes, illumination, and backgrounds. Different camera settings, viewpoints and inter-class similarities have increased the complexity of this domain. The above-mentioned challenges in uncontrolled environment have ultimately reduced the performances of many well-designed models. The primary objective of this research is to propose and design an automated recognition system by overcoming these afore-mentioned issues. Redundant features and excessive computational time for the training and prediction process has also been a noteworthy problem. In this article, a hybrid recognition technique called HAREDNet is proposed, which has a) Encoder-Decoder Network (EDNet) to extract deep features; b) improved Scale-Invariant Feature Transform (iSIFT), improved Gabor (iGabor) and Local Maximal Occurrence (LOMO) techniques to extract local features; c) Cross-view Quadratic Discriminant Analysis (CvQDA) algorithm to reduce the feature redundancy; and d) weighted fusion strategy to merge properties of different essential features. The proposed technique is evaluated on three (3) publicly available datasets, including NTU RGB+D, HMDB51, and UCF-101, and achieved average recognition accuracy of 97.45%, 80.58%, and 97.48%, respectively, which is better than previously proposed methods.

AB - Human Action Recognition (HAR) is still considered as a significant research area due to its emerging real-time applications like video surveillance, automated surveillance, real-time tracking and resecue missions. HAR domain still have gaps to cover, i.e., random changes in human variations, clothes, illumination, and backgrounds. Different camera settings, viewpoints and inter-class similarities have increased the complexity of this domain. The above-mentioned challenges in uncontrolled environment have ultimately reduced the performances of many well-designed models. The primary objective of this research is to propose and design an automated recognition system by overcoming these afore-mentioned issues. Redundant features and excessive computational time for the training and prediction process has also been a noteworthy problem. In this article, a hybrid recognition technique called HAREDNet is proposed, which has a) Encoder-Decoder Network (EDNet) to extract deep features; b) improved Scale-Invariant Feature Transform (iSIFT), improved Gabor (iGabor) and Local Maximal Occurrence (LOMO) techniques to extract local features; c) Cross-view Quadratic Discriminant Analysis (CvQDA) algorithm to reduce the feature redundancy; and d) weighted fusion strategy to merge properties of different essential features. The proposed technique is evaluated on three (3) publicly available datasets, including NTU RGB+D, HMDB51, and UCF-101, and achieved average recognition accuracy of 97.45%, 80.58%, and 97.48%, respectively, which is better than previously proposed methods.

KW - CvQDA

KW - Deep Convolutional Neural Network

KW - Encoder-Decoder CNN architecture

KW - Human Action Recognition

KW - Weighted fusion

UR - http://www.scopus.com/inward/record.url?scp=85124593088&partnerID=8YFLogxK

U2 - 10.1016/j.compeleceng.2022.107805

DO - 10.1016/j.compeleceng.2022.107805

M3 - Article

AN - SCOPUS:85124593088

SN - 0045-7906

VL - 99

JO - Computers and Electrical Engineering

JF - Computers and Electrical Engineering

M1 - 107805

ER -

HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this