Micro-Expression Recognition Using Convolutional Variational Attention Transformer (ConVAT) with Multihead Attention Mechanism

Hafiz Khizer Bin Talib; Kaiwei Xu; Yanlong Cao; Yuan Ping Xu; Zhijie Xu; Muhammad Zaman; Adnan Akhunzada

doi:10.1109/ACCESS.2025.3530114

Micro-Expression Recognition Using Convolutional Variational Attention Transformer (ConVAT) with Multihead Attention Mechanism

Hafiz Khizer Bin Talib, Kaiwei Xu, Yanlong Cao^*, Yuan Ping Xu, Zhijie Xu, Muhammad Zaman, Adnan Akhunzada^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Micro-Expression Recognition is crucial in various fields such as behavioral analysis, security, and psychological studies, offering valuable insights into subtle and often concealed emotional states. Despite significant advancements in deep learning models, challenges persist in accurately handling the nuanced and fleeting nature of micro-expressions, particularly when applied across diverse datasets with varied expressions. Existing models often struggle with precision and adaptability, leading to inconsistent recognition performance. To address these limitations, we propose the Convolutional Variational Attention Transformer (ConVAT), a novel model that leverages a multi-head attention mechanism integrated with convolutional networks, optimized specifically for detailed micro-expression analysis. Our methodology employs the Leave-One-Subject-Out (LOSO) cross-validation technique across three widely used datasets: SAMM, CASME II, and SMIC. The results demonstrate the effectiveness of ConVAT, achieving impressive performance with 98.73% accuracy on the SAMM dataset, 97.95% on the SMIC dataset, and 97.65% on CASME II. These outcomes not only surpass current state-of-the-art benchmarks but also highlight ConVAT's robustness and reliability in capturing micro-expressions, marking a significant advancement toward developing sophisticated automated systems for real-world applications in micro-expression recognition.

Original language	English
Pages (from-to)	20054-20070
Number of pages	17
Journal	IEEE Access
Volume	13
DOIs	https://doi.org/10.1109/ACCESS.2025.3530114
Publication status	Published - 2025
Externally published	Yes

Keywords

ConVAT
LOSO cross-validation
convolutional neural networks
micro-expression recognition
multi-head attention

Access to Document

10.1109/ACCESS.2025.3530114

Cite this

@article{dc6d79d9c5cb48ff99e5fbf9d74d1092,

title = "Micro-Expression Recognition Using Convolutional Variational Attention Transformer (ConVAT) with Multihead Attention Mechanism",

abstract = "Micro-Expression Recognition is crucial in various fields such as behavioral analysis, security, and psychological studies, offering valuable insights into subtle and often concealed emotional states. Despite significant advancements in deep learning models, challenges persist in accurately handling the nuanced and fleeting nature of micro-expressions, particularly when applied across diverse datasets with varied expressions. Existing models often struggle with precision and adaptability, leading to inconsistent recognition performance. To address these limitations, we propose the Convolutional Variational Attention Transformer (ConVAT), a novel model that leverages a multi-head attention mechanism integrated with convolutional networks, optimized specifically for detailed micro-expression analysis. Our methodology employs the Leave-One-Subject-Out (LOSO) cross-validation technique across three widely used datasets: SAMM, CASME II, and SMIC. The results demonstrate the effectiveness of ConVAT, achieving impressive performance with 98.73% accuracy on the SAMM dataset, 97.95% on the SMIC dataset, and 97.65% on CASME II. These outcomes not only surpass current state-of-the-art benchmarks but also highlight ConVAT's robustness and reliability in capturing micro-expressions, marking a significant advancement toward developing sophisticated automated systems for real-world applications in micro-expression recognition.",

keywords = "ConVAT, LOSO cross-validation, convolutional neural networks, micro-expression recognition, multi-head attention",

author = "{Khizer Bin Talib}, Hafiz and Kaiwei Xu and Yanlong Cao and {Ping Xu}, Yuan and Zhijie Xu and Muhammad Zaman and Adnan Akhunzada",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2025",

doi = "10.1109/ACCESS.2025.3530114",

language = "English",

volume = "13",

pages = "20054--20070",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Micro-Expression Recognition Using Convolutional Variational Attention Transformer (ConVAT) with Multihead Attention Mechanism

AU - Khizer Bin Talib, Hafiz

AU - Xu, Kaiwei

AU - Cao, Yanlong

AU - Ping Xu, Yuan

AU - Xu, Zhijie

AU - Zaman, Muhammad

AU - Akhunzada, Adnan

PY - 2025

Y1 - 2025

N2 - Micro-Expression Recognition is crucial in various fields such as behavioral analysis, security, and psychological studies, offering valuable insights into subtle and often concealed emotional states. Despite significant advancements in deep learning models, challenges persist in accurately handling the nuanced and fleeting nature of micro-expressions, particularly when applied across diverse datasets with varied expressions. Existing models often struggle with precision and adaptability, leading to inconsistent recognition performance. To address these limitations, we propose the Convolutional Variational Attention Transformer (ConVAT), a novel model that leverages a multi-head attention mechanism integrated with convolutional networks, optimized specifically for detailed micro-expression analysis. Our methodology employs the Leave-One-Subject-Out (LOSO) cross-validation technique across three widely used datasets: SAMM, CASME II, and SMIC. The results demonstrate the effectiveness of ConVAT, achieving impressive performance with 98.73% accuracy on the SAMM dataset, 97.95% on the SMIC dataset, and 97.65% on CASME II. These outcomes not only surpass current state-of-the-art benchmarks but also highlight ConVAT's robustness and reliability in capturing micro-expressions, marking a significant advancement toward developing sophisticated automated systems for real-world applications in micro-expression recognition.

AB - Micro-Expression Recognition is crucial in various fields such as behavioral analysis, security, and psychological studies, offering valuable insights into subtle and often concealed emotional states. Despite significant advancements in deep learning models, challenges persist in accurately handling the nuanced and fleeting nature of micro-expressions, particularly when applied across diverse datasets with varied expressions. Existing models often struggle with precision and adaptability, leading to inconsistent recognition performance. To address these limitations, we propose the Convolutional Variational Attention Transformer (ConVAT), a novel model that leverages a multi-head attention mechanism integrated with convolutional networks, optimized specifically for detailed micro-expression analysis. Our methodology employs the Leave-One-Subject-Out (LOSO) cross-validation technique across three widely used datasets: SAMM, CASME II, and SMIC. The results demonstrate the effectiveness of ConVAT, achieving impressive performance with 98.73% accuracy on the SAMM dataset, 97.95% on the SMIC dataset, and 97.65% on CASME II. These outcomes not only surpass current state-of-the-art benchmarks but also highlight ConVAT's robustness and reliability in capturing micro-expressions, marking a significant advancement toward developing sophisticated automated systems for real-world applications in micro-expression recognition.

KW - ConVAT

KW - LOSO cross-validation

KW - convolutional neural networks

KW - micro-expression recognition

KW - multi-head attention

UR - http://www.scopus.com/inward/record.url?scp=85215962767&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2025.3530114

DO - 10.1109/ACCESS.2025.3530114

M3 - Article

AN - SCOPUS:85215962767

SN - 2169-3536

VL - 13

SP - 20054

EP - 20070

JO - IEEE Access

JF - IEEE Access

ER -

Micro-Expression Recognition Using Convolutional Variational Attention Transformer (ConVAT) with Multihead Attention Mechanism

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this