Skip to main navigation Skip to search Skip to main content

Genesis: A Large-Scale Benchmark for Multimodal Large Language Model in Emotional Causality Analysis

  • Yulong Li
  • , Yuxuan Zhang
  • , Rui Chen
  • , Feilong Tang
  • , Zhixiang Lu
  • , Ming Hu
  • , Jianghao Wu
  • , Haochen Xue
  • , Mian Zhou
  • , Chong Li
  • , Jionglong Su*
  • , Imran Razzak*
  • *Corresponding author for this work
  • Mohamed Bin Zayed University of Artificial Intelligence
  • Xi'an Jiaotong-Liverpool University
  • Monash University

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

2 Citations (Scopus)

Abstract

Artificial intelligence will not achieve genuine empathy until models can reason about the causes of human emotions rather than only label them. Current datasets fail to support this objective, as existing emotional causality datasets primarily focus on textual modalities, lack non-verbal information such as speech and facial expressions, feature relatively short dialogue lengths, and limit research on long-term emotional evolution. Existing annotations concentrate on stimulus-response patterns and lack cross-temporal emotional causal chain annotations, failing to reveal how early events accumulate and ultimately trigger emotional changes. In this work, we introduce Genesis, the first multimodal dialogue dataset supporting long-term emotional causality analysis, which Genesis contains 1,000 dialogues averaging 208 turns each, spanning debate, family, educational, and social scenarios. Through two-layer annotation system: proximal cause identification and long-term causal chain tracking, Genesis labels complex emotional phenomena including cross-modal inconsistencies and long-distance causal dependencies. Our evaluation of 20 mainstream multimodal models reveals limitations in current approaches for long-term emotional causality. We propose Empathica as an evaluation baseline, employing a Recognition-Memory-Attribution architecture that integrates dynamic sliding windows and event aggregation mechanisms to address multimodal emotional causality modeling challenges. Empathica outperforms text-based models GPT-o1, and multimodal model Gemini 1.5 Pro and GPT-4o across all evaluation metrics.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages12651-12658
Number of pages8
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • causal chain annotation
  • emotional causality dataset
  • long-term causal modeling
  • long-term multimodal conversation analysis

Cite this