Phy-FusionNet: A Memory-Augmented Transformer for Multimodal Emotion Recognition With Periodicity and Contextual Attention: A Memory-Augmented Transformer for Multimodal Emotion Recognition With Periodicity and Contextual Attention

Tianyi Wu, Erick Purwanto*, Yongrun Huang, Su Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate emotion recognition from physiological signals is critical for applications in healthcare, autonomous systems, and human-computer interaction. However, prevailing methods often fail to model long-term dependencies and overlook periodic patterns inherent in physiological data. To address these challenges, we propose Phy-FusionNet, a novel memory-augmented transformer architecture for multimodal emotion recognition. Phy-FusionNet introduces a Memory Stream Module with FIFO-queue and decay-based updates to preserve long-term contextual information. It further integrates Fourier-based positional encoding and frequency-aware attention, enabling robust detection of periodic emotional cues. An Adaptive Temporal Attention Module enhances computational efficiency and enables dynamic relevance in temporal feature extraction. For cross-modal fusion, we employ a transformer-based Multimodal Binding Learning framework that balances modality-specific and shared features. Extensive experiments on five public datasets—WESAD, CL-Drive, PPB-Emo, PhyMER, and EEG-VUI—demonstrate that Phy-FusionNet outperforms state-of-the-art models, achieving up to 16.3% improvement in accuracy and superior robustness across diverse emotional states and noisy environments. Notably, the model maintains low performance variance across emotion classes, with F1-Score differences under 2.5%, indicating stable recognition even for subtle or overlapping emotions. Our results underscore the importance of integrating memory, frequency, and adaptive attention for effective affective computing. The code will be publicly available on GitHub.

Original languageEnglish
JournalIEEE Transactions on Affective Computing
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • affective computing
  • attention mechanisms
  • Emotion recognition
  • memory-augmented neural networks
  • multimodal fusion
  • physiological signals
  • transformer networks

Fingerprint

Dive into the research topics of 'Phy-FusionNet: A Memory-Augmented Transformer for Multimodal Emotion Recognition With Periodicity and Contextual Attention: A Memory-Augmented Transformer for Multimodal Emotion Recognition With Periodicity and Contextual Attention'. Together they form a unique fingerprint.

Cite this