Visually Impaired Assistance with Large Models

Rong Xiang*, Yi Zhao*, Yilin Zhang, Jing Li, Ming Liao, Yushi Li

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Visually Impaired Assistance (VIA) is designed to facilitate the autonomous execution of daily activities for individuals with visual impairments through automated support. The progress in VIA technology is intrinsically linked to advancements in Computer Vision (CV) and Natural Language Processing (NLP) via Large Models (LMs). To investigate the possibility of current state-of-the-art (SOTA) LMs in VIA scenarios, we conduct a comprehensive study titled Visual Impaired Assistance with Large Models (VIALM). This study focuses on a task where a photo depicting the reality and linguistic question from a visually impaired user are provided to generate elaborate guidance aimed at assisting the user in fulfilling their request within that specific context. Our study, Visual Impaired Assistance with Large Models (VIALM), examines the application of SOTA LMs in VIA by providing an image and a textual request from a visually impaired user to generate detailed, context-specific guidance. Through benchmarking experiments and a review of recent LM advancements, our research indicates that while LMs have significant potential for enhancing VIA functionalities, they exhibit limitations such as insufficient environmental grounding (e.g., 2 5. 7 % failure rate in GPT-4) and inadequate detail in instructions (approximately 3 2. 1 % deficiency rate in GPT-4).

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1645-1650
Number of pages6
ISBN (Electronic)9798331520861
DOIs
Publication statusPublished - 2024
Event10th IEEE Smart World Congress, SWC 2024 - Nadi, Fiji
Duration: 2 Dec 20247 Dec 2024

Publication series

NameProceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications

Conference

Conference10th IEEE Smart World Congress, SWC 2024
Country/TerritoryFiji
CityNadi
Period2/12/247/12/24

Keywords

  • artificial inteligence
  • big data
  • natural language processing

Fingerprint

Dive into the research topics of 'Visually Impaired Assistance with Large Models'. Together they form a unique fingerprint.

Cite this