Abstract
We present a preliminary investigation into the capabilities of current large language models (LLMs), i.e., ChatGPT and Gemini, in supporting visual analytics tasks for videos containing dynamically changing information. Videos are inherently multimodal, combining visual frames, audio narration, and sometimes text---often with inconsistencies or redundancies across channels---which poses challenges for reliable data extraction. While recent advances in video understanding have improved general-purpose AI performance, relatively little work has explored how generative AI can extract, prepare, and visualize data from videos through prompts, particularly where multimodal conflicts, dynamic updates, and moving entities are involved. To explore this space, we first categorize information-bearing videos along four dimensions: data type, data dynamics, visualization presence, and audio-visual alignment. We then apply LLMs to extract and structure information from representative video samples to support downstream visualization. We conclude with reflections and outline a research agenda for AI-assisted video-based visual analytics. Our OSF repository is at https://osf.io/ygn4c/.
| Original language | English |
|---|---|
| Title of host publication | IEEE VIS workshop on GenAI, Agents, and the Future of VIS |
| Pages | 1-5 |
| Number of pages | 5 |
| Publication status | Published - 3 Nov 2025 |
Keywords
- Visualization
- Video analysis
- Large language model
Fingerprint
Dive into the research topics of 'Early Exploration into AI-Assisted Visual Analytics for Dynamic Videos'. Together they form a unique fingerprint.Projects
- 2 Active
-
A First Design Space of Visualization in Motion
Yao, L. (PI)
1/01/25 → 31/12/27
Project: Internal Research Project
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver