Early Exploration into AI-Assisted Visual Analytics for Dynamic Videos

  • Qi Guo
  • , Junyi Li
  • , Jiayi Hong
  • , Lijie Yao*
  • *Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

We present a preliminary investigation into the capabilities of current large language models (LLMs), i.e., ChatGPT and Gemini, in supporting visual analytics tasks for videos containing dynamically changing information. Videos are inherently multimodal, combining visual frames, audio narration, and sometimes text---often with inconsistencies or redundancies across channels---which poses challenges for reliable data extraction. While recent advances in video understanding have improved general-purpose AI performance, relatively little work has explored how generative AI can extract, prepare, and visualize data from videos through prompts, particularly where multimodal conflicts, dynamic updates, and moving entities are involved. To explore this space, we first categorize information-bearing videos along four dimensions: data type, data dynamics, visualization presence, and audio-visual alignment. We then apply LLMs to extract and structure information from representative video samples to support downstream visualization. We conclude with reflections and outline a research agenda for AI-assisted video-based visual analytics. Our OSF repository is at https://osf.io/ygn4c/.
Original languageEnglish
Title of host publicationIEEE VIS workshop on GenAI, Agents, and the Future of VIS
Pages1-5
Number of pages5
Publication statusPublished - 3 Nov 2025

Keywords

  • Visualization
  • Video analysis
  • Large language model

Fingerprint

Dive into the research topics of 'Early Exploration into AI-Assisted Visual Analytics for Dynamic Videos'. Together they form a unique fingerprint.

Cite this