TY - GEN
T1 - Genesis
T2 - 33rd ACM International Conference on Multimedia, MM 2025
AU - Li, Yulong
AU - Zhang, Yuxuan
AU - Chen, Rui
AU - Tang, Feilong
AU - Lu, Zhixiang
AU - Hu, Ming
AU - Wu, Jianghao
AU - Xue, Haochen
AU - Zhou, Mian
AU - Li, Chong
AU - Su, Jionglong
AU - Razzak, Imran
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/10/27
Y1 - 2025/10/27
N2 - Artificial intelligence will not achieve genuine empathy until models can reason about the causes of human emotions rather than only label them. Current datasets fail to support this objective, as existing emotional causality datasets primarily focus on textual modalities, lack non-verbal information such as speech and facial expressions, feature relatively short dialogue lengths, and limit research on long-term emotional evolution. Existing annotations concentrate on stimulus-response patterns and lack cross-temporal emotional causal chain annotations, failing to reveal how early events accumulate and ultimately trigger emotional changes. In this work, we introduce Genesis, the first multimodal dialogue dataset supporting long-term emotional causality analysis, which Genesis contains 1,000 dialogues averaging 208 turns each, spanning debate, family, educational, and social scenarios. Through two-layer annotation system: proximal cause identification and long-term causal chain tracking, Genesis labels complex emotional phenomena including cross-modal inconsistencies and long-distance causal dependencies. Our evaluation of 20 mainstream multimodal models reveals limitations in current approaches for long-term emotional causality. We propose Empathica as an evaluation baseline, employing a Recognition-Memory-Attribution architecture that integrates dynamic sliding windows and event aggregation mechanisms to address multimodal emotional causality modeling challenges. Empathica outperforms text-based models GPT-o1, and multimodal model Gemini 1.5 Pro and GPT-4o across all evaluation metrics.
AB - Artificial intelligence will not achieve genuine empathy until models can reason about the causes of human emotions rather than only label them. Current datasets fail to support this objective, as existing emotional causality datasets primarily focus on textual modalities, lack non-verbal information such as speech and facial expressions, feature relatively short dialogue lengths, and limit research on long-term emotional evolution. Existing annotations concentrate on stimulus-response patterns and lack cross-temporal emotional causal chain annotations, failing to reveal how early events accumulate and ultimately trigger emotional changes. In this work, we introduce Genesis, the first multimodal dialogue dataset supporting long-term emotional causality analysis, which Genesis contains 1,000 dialogues averaging 208 turns each, spanning debate, family, educational, and social scenarios. Through two-layer annotation system: proximal cause identification and long-term causal chain tracking, Genesis labels complex emotional phenomena including cross-modal inconsistencies and long-distance causal dependencies. Our evaluation of 20 mainstream multimodal models reveals limitations in current approaches for long-term emotional causality. We propose Empathica as an evaluation baseline, employing a Recognition-Memory-Attribution architecture that integrates dynamic sliding windows and event aggregation mechanisms to address multimodal emotional causality modeling challenges. Empathica outperforms text-based models GPT-o1, and multimodal model Gemini 1.5 Pro and GPT-4o across all evaluation metrics.
KW - causal chain annotation
KW - emotional causality dataset
KW - long-term causal modeling
KW - long-term multimodal conversation analysis
UR - https://www.scopus.com/pages/publications/105024067293
U2 - 10.1145/3746027.3758202
DO - 10.1145/3746027.3758202
M3 - Conference Proceeding
AN - SCOPUS:105024067293
T3 - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
SP - 12651
EP - 12658
BT - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PB - Association for Computing Machinery, Inc
Y2 - 27 October 2025 through 31 October 2025
ER -