Abstract
Long-sequence causal reasoning seeks to uncover causal relationships within extended time series data but is hindered by complex dependencies and the challenges of validating causal links. To address the limitations of large-scale language models (e.g., GPT-4) in capturing intricate emotional causality within extended dialogues, we propose CauseMotion, an innovative framework combining emotional causal dynamic mapping with multimodal feature fusion. CauseMotion implements dynamic mapping through a sliding window mechanism and fusion strategies, while integrating audio features - vocal emotion, intensity, and speech rate - to enrich semantic representations. This design enables efficient retrieval of contextually relevant information and precise inference of emotional causal chains spanning multiple conversational turns. We constructed the first benchmark dataset for long-sequence emotional causal reasoning, featuring dialogues with over 70 turns. Experimental results show that CauseMotion significantly enhances emotional understanding and causal inference capabilities in large language models. A GLM-4 integrated with CauseMotion achieves an 8.7% improvement in causal accuracy over the original model and surpasses GPT-4o by 1.2%. On the DiaASQ dataset, CauseMotion-GLM-4 achieves state-of-the-art results in accuracy, F1 score, and causal reasoning accuracy.
| Original language | English |
|---|---|
| Journal | Proceedings - IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS |
| Issue number | 2025 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 2025 IEEE International Conference on Advanced Visual and Signal-Based Systems, AVSS 2025 - Tainan, Taiwan, Province of China Duration: 11 Aug 2025 → 13 Aug 2025 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver