TY - GEN
T1 - Visually Impaired Assistance with Large Models
AU - Xiang, Rong
AU - Zhao, Yi
AU - Zhang, Yilin
AU - Li, Jing
AU - Liao, Ming
AU - Li, Yushi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Visually Impaired Assistance (VIA) is designed to facilitate the autonomous execution of daily activities for individuals with visual impairments through automated support. The progress in VIA technology is intrinsically linked to advancements in Computer Vision (CV) and Natural Language Processing (NLP) via Large Models (LMs). To investigate the possibility of current state-of-the-art (SOTA) LMs in VIA scenarios, we conduct a comprehensive study titled Visual Impaired Assistance with Large Models (VIALM). This study focuses on a task where a photo depicting the reality and linguistic question from a visually impaired user are provided to generate elaborate guidance aimed at assisting the user in fulfilling their request within that specific context. Our study, Visual Impaired Assistance with Large Models (VIALM), examines the application of SOTA LMs in VIA by providing an image and a textual request from a visually impaired user to generate detailed, context-specific guidance. Through benchmarking experiments and a review of recent LM advancements, our research indicates that while LMs have significant potential for enhancing VIA functionalities, they exhibit limitations such as insufficient environmental grounding (e.g., 2 5. 7 % failure rate in GPT-4) and inadequate detail in instructions (approximately 3 2. 1 % deficiency rate in GPT-4).
AB - Visually Impaired Assistance (VIA) is designed to facilitate the autonomous execution of daily activities for individuals with visual impairments through automated support. The progress in VIA technology is intrinsically linked to advancements in Computer Vision (CV) and Natural Language Processing (NLP) via Large Models (LMs). To investigate the possibility of current state-of-the-art (SOTA) LMs in VIA scenarios, we conduct a comprehensive study titled Visual Impaired Assistance with Large Models (VIALM). This study focuses on a task where a photo depicting the reality and linguistic question from a visually impaired user are provided to generate elaborate guidance aimed at assisting the user in fulfilling their request within that specific context. Our study, Visual Impaired Assistance with Large Models (VIALM), examines the application of SOTA LMs in VIA by providing an image and a textual request from a visually impaired user to generate detailed, context-specific guidance. Through benchmarking experiments and a review of recent LM advancements, our research indicates that while LMs have significant potential for enhancing VIA functionalities, they exhibit limitations such as insufficient environmental grounding (e.g., 2 5. 7 % failure rate in GPT-4) and inadequate detail in instructions (approximately 3 2. 1 % deficiency rate in GPT-4).
KW - artificial inteligence
KW - big data
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=105002255982&partnerID=8YFLogxK
U2 - 10.1109/SWC62898.2024.00253
DO - 10.1109/SWC62898.2024.00253
M3 - Conference Proceeding
AN - SCOPUS:105002255982
T3 - Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications
SP - 1645
EP - 1650
BT - Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE Smart World Congress, SWC 2024
Y2 - 2 December 2024 through 7 December 2024
ER -