MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Haochen Xue, Feilong Tang, Ming Hu, yexin Liu Lu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Zhang 张冲, Chun-Mei Feng, Yutong Xie, Imran Razzak*, Zongyuan Ge*, Jionglong Su*, Junjun He*, Yu Qiao

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

174 Downloads (Pure)

Abstract

Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation, inadequacies in updating factual knowledge, accumulated assumption of error propagation, and reluctance to “say no.” To mitigate these issues, we propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses, enhancing conversational capabilities. Experiments across six MLLMs demonstrate significant performance improvements.
Original languageEnglish
Title of host publicationThe Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationACL 2025
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Place of PublicationVienna, Austria
PublisherAssociation for Computational Linguistics (ACL)
Chapter1
Pages22477
Number of pages22503
Volume1
Edition1
ISBN (Electronic)979-8-89176-251-0
ISBN (Print)979-8-89176-251-0
DOIs
Publication statusPublished - 24 Jul 2025
Event63rd Annual Meeting of the Association for Computational Linguistics: ACL 2025 - Vienna, Austria, Vienna, Austria
Duration: 27 Jul 20241 Aug 2025
https://2025.aclweb.org/

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics
Country/TerritoryAustria
CityVienna
Period27/07/241/08/25
Internet address

Fingerprint

Dive into the research topics of 'MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation'. Together they form a unique fingerprint.

Cite this