TY - JOUR
T1 - Phy-FusionNet: A Memory-Augmented Transformer for Multimodal Emotion Recognition With Periodicity and Contextual Attention
T2 - A Memory-Augmented Transformer for Multimodal Emotion Recognition With Periodicity and Contextual Attention
AU - Wu, Tianyi
AU - Purwanto, Erick
AU - Huang, Yongrun
AU - Yang, Su
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Accurate emotion recognition from physiological signals is critical for applications in healthcare, autonomous systems, and human-computer interaction. However, prevailing methods often fail to model long-term dependencies and overlook periodic patterns inherent in physiological data. To address these challenges, we propose Phy-FusionNet, a novel memory-augmented transformer architecture for multimodal emotion recognition. Phy-FusionNet introduces a Memory Stream Module with FIFO-queue and decay-based updates to preserve long-term contextual information. It further integrates Fourier-based positional encoding and frequency-aware attention, enabling robust detection of periodic emotional cues. An Adaptive Temporal Attention Module enhances computational efficiency and enables dynamic relevance in temporal feature extraction. For cross-modal fusion, we employ a transformer-based Multimodal Binding Learning framework that balances modality-specific and shared features. Extensive experiments on five public datasets—WESAD, CL-Drive, PPB-Emo, PhyMER, and EEG-VUI—demonstrate that Phy-FusionNet outperforms state-of-the-art models, achieving up to 16.3% improvement in accuracy and superior robustness across diverse emotional states and noisy environments. Notably, the model maintains low performance variance across emotion classes, with F1-Score differences under 2.5%, indicating stable recognition even for subtle or overlapping emotions. Our results underscore the importance of integrating memory, frequency, and adaptive attention for effective affective computing. The code will be publicly available on GitHub.
AB - Accurate emotion recognition from physiological signals is critical for applications in healthcare, autonomous systems, and human-computer interaction. However, prevailing methods often fail to model long-term dependencies and overlook periodic patterns inherent in physiological data. To address these challenges, we propose Phy-FusionNet, a novel memory-augmented transformer architecture for multimodal emotion recognition. Phy-FusionNet introduces a Memory Stream Module with FIFO-queue and decay-based updates to preserve long-term contextual information. It further integrates Fourier-based positional encoding and frequency-aware attention, enabling robust detection of periodic emotional cues. An Adaptive Temporal Attention Module enhances computational efficiency and enables dynamic relevance in temporal feature extraction. For cross-modal fusion, we employ a transformer-based Multimodal Binding Learning framework that balances modality-specific and shared features. Extensive experiments on five public datasets—WESAD, CL-Drive, PPB-Emo, PhyMER, and EEG-VUI—demonstrate that Phy-FusionNet outperforms state-of-the-art models, achieving up to 16.3% improvement in accuracy and superior robustness across diverse emotional states and noisy environments. Notably, the model maintains low performance variance across emotion classes, with F1-Score differences under 2.5%, indicating stable recognition even for subtle or overlapping emotions. Our results underscore the importance of integrating memory, frequency, and adaptive attention for effective affective computing. The code will be publicly available on GitHub.
KW - affective computing
KW - attention mechanisms
KW - Emotion recognition
KW - memory-augmented neural networks
KW - multimodal fusion
KW - physiological signals
KW - transformer networks
UR - https://www.scopus.com/pages/publications/105017088299
U2 - 10.1109/TAFFC.2025.3609046
DO - 10.1109/TAFFC.2025.3609046
M3 - Article
AN - SCOPUS:105017088299
SN - 1949-3045
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
ER -