Ensemble CNN-ViT Using Feature-Level Fusion for Gait Recognition

Jashila Nair Mogan; Chin Poo Lee; Kian Ming Lim

doi:10.1109/ACCESS.2024.3439602

Ensemble CNN-ViT Using Feature-Level Fusion for Gait Recognition

Jashila Nair Mogan^*, Chin Poo Lee, Kian Ming Lim

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Individual deep learning models showcase impressive performance; however, the capacity of a single model might fall short in capturing the full spectrum of intricate patterns present in the input data. Thus, relying solely on a single model may hamper the attainment of optimal results and broader generalization. In light of this, the paper presents an ensemble method that leverages the strengths of multiple Convolutional Neural Networks (CNNs) and Transformer models to elevate gait recognition performance. Additionally, a novel gait representation named windowed Gait Energy Image (GEI) is introduced, obtained by averaging gait frames irrespective of gait cycles. Firstly, the windowed GEI is input to the Convolutional Neural Networks and Transformer models to learn significant gait features. Each model is followed by a Multilayer Perceptron (MLP) to encode the relationship between the extracted features and corresponding class labels. Subsequently, the extracted gait features from each model are flattened and concatenated into a cohesive feature representation before passing through another MLP for subject classification. The performance of the proposed method was assessed on three datasets: OU-ISIR dataset D, CASIA-B, and OU-LP dataset. Experimental results demonstrated remarkable improvements compared to existing methods across all three datasets. The proposed method achieved accuracy rates of 100% on OU-ISIR D, 99.93% on CASIA-B, and 99.94% on OU-LP, showcasing the superior performance of the Ensemble CNN-ViT model using feature-level fusion compared to state-of-the-art methods.

Original language	English
Pages (from-to)	108573-108583
Number of pages	11
Journal	IEEE Access
Volume	12
DOIs	https://doi.org/10.1109/ACCESS.2024.3439602
Publication status	Published - 2024
Externally published	Yes

Keywords

Deep learning
ensemble
feature-fusion
fusion
gait
gait recognition

Access to Document

10.1109/ACCESS.2024.3439602

Cite this

@article{d479f70d1dfe4514b118a44c32c0072e,

title = "Ensemble CNN-ViT Using Feature-Level Fusion for Gait Recognition",

abstract = "Individual deep learning models showcase impressive performance; however, the capacity of a single model might fall short in capturing the full spectrum of intricate patterns present in the input data. Thus, relying solely on a single model may hamper the attainment of optimal results and broader generalization. In light of this, the paper presents an ensemble method that leverages the strengths of multiple Convolutional Neural Networks (CNNs) and Transformer models to elevate gait recognition performance. Additionally, a novel gait representation named windowed Gait Energy Image (GEI) is introduced, obtained by averaging gait frames irrespective of gait cycles. Firstly, the windowed GEI is input to the Convolutional Neural Networks and Transformer models to learn significant gait features. Each model is followed by a Multilayer Perceptron (MLP) to encode the relationship between the extracted features and corresponding class labels. Subsequently, the extracted gait features from each model are flattened and concatenated into a cohesive feature representation before passing through another MLP for subject classification. The performance of the proposed method was assessed on three datasets: OU-ISIR dataset D, CASIA-B, and OU-LP dataset. Experimental results demonstrated remarkable improvements compared to existing methods across all three datasets. The proposed method achieved accuracy rates of 100% on OU-ISIR D, 99.93% on CASIA-B, and 99.94% on OU-LP, showcasing the superior performance of the Ensemble CNN-ViT model using feature-level fusion compared to state-of-the-art methods.",

keywords = "Deep learning, ensemble, feature-fusion, fusion, gait, gait recognition",

author = "Mogan, {Jashila Nair} and Lee, {Chin Poo} and Lim, {Kian Ming}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2024",

doi = "10.1109/ACCESS.2024.3439602",

language = "English",

volume = "12",

pages = "108573--108583",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Ensemble CNN-ViT Using Feature-Level Fusion for Gait Recognition

AU - Mogan, Jashila Nair

AU - Lee, Chin Poo

AU - Lim, Kian Ming

PY - 2024

Y1 - 2024

N2 - Individual deep learning models showcase impressive performance; however, the capacity of a single model might fall short in capturing the full spectrum of intricate patterns present in the input data. Thus, relying solely on a single model may hamper the attainment of optimal results and broader generalization. In light of this, the paper presents an ensemble method that leverages the strengths of multiple Convolutional Neural Networks (CNNs) and Transformer models to elevate gait recognition performance. Additionally, a novel gait representation named windowed Gait Energy Image (GEI) is introduced, obtained by averaging gait frames irrespective of gait cycles. Firstly, the windowed GEI is input to the Convolutional Neural Networks and Transformer models to learn significant gait features. Each model is followed by a Multilayer Perceptron (MLP) to encode the relationship between the extracted features and corresponding class labels. Subsequently, the extracted gait features from each model are flattened and concatenated into a cohesive feature representation before passing through another MLP for subject classification. The performance of the proposed method was assessed on three datasets: OU-ISIR dataset D, CASIA-B, and OU-LP dataset. Experimental results demonstrated remarkable improvements compared to existing methods across all three datasets. The proposed method achieved accuracy rates of 100% on OU-ISIR D, 99.93% on CASIA-B, and 99.94% on OU-LP, showcasing the superior performance of the Ensemble CNN-ViT model using feature-level fusion compared to state-of-the-art methods.

AB - Individual deep learning models showcase impressive performance; however, the capacity of a single model might fall short in capturing the full spectrum of intricate patterns present in the input data. Thus, relying solely on a single model may hamper the attainment of optimal results and broader generalization. In light of this, the paper presents an ensemble method that leverages the strengths of multiple Convolutional Neural Networks (CNNs) and Transformer models to elevate gait recognition performance. Additionally, a novel gait representation named windowed Gait Energy Image (GEI) is introduced, obtained by averaging gait frames irrespective of gait cycles. Firstly, the windowed GEI is input to the Convolutional Neural Networks and Transformer models to learn significant gait features. Each model is followed by a Multilayer Perceptron (MLP) to encode the relationship between the extracted features and corresponding class labels. Subsequently, the extracted gait features from each model are flattened and concatenated into a cohesive feature representation before passing through another MLP for subject classification. The performance of the proposed method was assessed on three datasets: OU-ISIR dataset D, CASIA-B, and OU-LP dataset. Experimental results demonstrated remarkable improvements compared to existing methods across all three datasets. The proposed method achieved accuracy rates of 100% on OU-ISIR D, 99.93% on CASIA-B, and 99.94% on OU-LP, showcasing the superior performance of the Ensemble CNN-ViT model using feature-level fusion compared to state-of-the-art methods.

KW - Deep learning

KW - ensemble

KW - feature-fusion

KW - fusion

KW - gait

KW - gait recognition

UR - http://www.scopus.com/inward/record.url?scp=85200811929&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2024.3439602

DO - 10.1109/ACCESS.2024.3439602

M3 - Article

AN - SCOPUS:85200811929

SN - 2169-3536

VL - 12

SP - 108573

EP - 108583

JO - IEEE Access

JF - IEEE Access

ER -

Ensemble CNN-ViT Using Feature-Level Fusion for Gait Recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this