Visually Impaired Assistance with Large Models

Rong Xiang; Yi Zhao; Yilin Zhang; Jing Li; Ming Liao; Yushi Li

doi:10.1109/SWC62898.2024.00253

Visually Impaired Assistance with Large Models

Rong Xiang^*, Yi Zhao^*, Yilin Zhang, Jing Li, Ming Liao, Yushi Li

^*Corresponding author for this work

Department of Intelligent Science

Hong Kong Polytechnic University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Visually Impaired Assistance (VIA) is designed to facilitate the autonomous execution of daily activities for individuals with visual impairments through automated support. The progress in VIA technology is intrinsically linked to advancements in Computer Vision (CV) and Natural Language Processing (NLP) via Large Models (LMs). To investigate the possibility of current state-of-the-art (SOTA) LMs in VIA scenarios, we conduct a comprehensive study titled Visual Impaired Assistance with Large Models (VIALM). This study focuses on a task where a photo depicting the reality and linguistic question from a visually impaired user are provided to generate elaborate guidance aimed at assisting the user in fulfilling their request within that specific context. Our study, Visual Impaired Assistance with Large Models (VIALM), examines the application of SOTA LMs in VIA by providing an image and a textual request from a visually impaired user to generate detailed, context-specific guidance. Through benchmarking experiments and a review of recent LM advancements, our research indicates that while LMs have significant potential for enhancing VIA functionalities, they exhibit limitations such as insufficient environmental grounding (e.g., 2 5. 7 % failure rate in GPT-4) and inadequate detail in instructions (approximately 3 2. 1 % deficiency rate in GPT-4).

Original language	English
Title of host publication	Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1645-1650
Number of pages	6
ISBN (Electronic)	9798331520861
DOIs	https://doi.org/10.1109/SWC62898.2024.00253
Publication status	Published - 2024
Event	10th IEEE Smart World Congress, SWC 2024 - Nadi, Fiji Duration: 2 Dec 2024 → 7 Dec 2024

Publication series

Name	Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications

Conference

Conference	10th IEEE Smart World Congress, SWC 2024
Country/Territory	Fiji
City	Nadi
Period	2/12/24 → 7/12/24

Keywords

artificial inteligence
big data
natural language processing

Access to Document

10.1109/SWC62898.2024.00253

Cite this

Xiang, R., Zhao, Y., Zhang, Y., Li, J., Liao, M., & Li, Y. (2024). Visually Impaired Assistance with Large Models. In Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications (pp. 1645-1650). (Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SWC62898.2024.00253

Xiang, Rong ; Zhao, Yi ; Zhang, Yilin et al. / Visually Impaired Assistance with Large Models. Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 1645-1650 (Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications).

@inproceedings{79386e43392f416781f52fdfcd36be38,

title = "Visually Impaired Assistance with Large Models",

abstract = "Visually Impaired Assistance (VIA) is designed to facilitate the autonomous execution of daily activities for individuals with visual impairments through automated support. The progress in VIA technology is intrinsically linked to advancements in Computer Vision (CV) and Natural Language Processing (NLP) via Large Models (LMs). To investigate the possibility of current state-of-the-art (SOTA) LMs in VIA scenarios, we conduct a comprehensive study titled Visual Impaired Assistance with Large Models (VIALM). This study focuses on a task where a photo depicting the reality and linguistic question from a visually impaired user are provided to generate elaborate guidance aimed at assisting the user in fulfilling their request within that specific context. Our study, Visual Impaired Assistance with Large Models (VIALM), examines the application of SOTA LMs in VIA by providing an image and a textual request from a visually impaired user to generate detailed, context-specific guidance. Through benchmarking experiments and a review of recent LM advancements, our research indicates that while LMs have significant potential for enhancing VIA functionalities, they exhibit limitations such as insufficient environmental grounding (e.g., 2 5. 7 % failure rate in GPT-4) and inadequate detail in instructions (approximately 3 2. 1 % deficiency rate in GPT-4).",

keywords = "artificial inteligence, big data, natural language processing",

author = "Rong Xiang and Yi Zhao and Yilin Zhang and Jing Li and Ming Liao and Yushi Li",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 10th IEEE Smart World Congress, SWC 2024 ; Conference date: 02-12-2024 Through 07-12-2024",

year = "2024",

doi = "10.1109/SWC62898.2024.00253",

language = "English",

series = "Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1645--1650",

booktitle = "Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications",

}

Xiang, R, Zhao, Y, Zhang, Y, Li, J, Liao, M & Li, Y 2024, Visually Impaired Assistance with Large Models. in Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications. Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications, Institute of Electrical and Electronics Engineers Inc., pp. 1645-1650, 10th IEEE Smart World Congress, SWC 2024, Nadi, Fiji, 2/12/24. https://doi.org/10.1109/SWC62898.2024.00253

Visually Impaired Assistance with Large Models. / Xiang, Rong; Zhao, Yi; Zhang, Yilin et al.
Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications. Institute of Electrical and Electronics Engineers Inc., 2024. p. 1645-1650 (Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Visually Impaired Assistance with Large Models

AU - Xiang, Rong

AU - Zhao, Yi

AU - Zhang, Yilin

AU - Li, Jing

AU - Liao, Ming

AU - Li, Yushi

PY - 2024

Y1 - 2024

N2 - Visually Impaired Assistance (VIA) is designed to facilitate the autonomous execution of daily activities for individuals with visual impairments through automated support. The progress in VIA technology is intrinsically linked to advancements in Computer Vision (CV) and Natural Language Processing (NLP) via Large Models (LMs). To investigate the possibility of current state-of-the-art (SOTA) LMs in VIA scenarios, we conduct a comprehensive study titled Visual Impaired Assistance with Large Models (VIALM). This study focuses on a task where a photo depicting the reality and linguistic question from a visually impaired user are provided to generate elaborate guidance aimed at assisting the user in fulfilling their request within that specific context. Our study, Visual Impaired Assistance with Large Models (VIALM), examines the application of SOTA LMs in VIA by providing an image and a textual request from a visually impaired user to generate detailed, context-specific guidance. Through benchmarking experiments and a review of recent LM advancements, our research indicates that while LMs have significant potential for enhancing VIA functionalities, they exhibit limitations such as insufficient environmental grounding (e.g., 2 5. 7 % failure rate in GPT-4) and inadequate detail in instructions (approximately 3 2. 1 % deficiency rate in GPT-4).

AB - Visually Impaired Assistance (VIA) is designed to facilitate the autonomous execution of daily activities for individuals with visual impairments through automated support. The progress in VIA technology is intrinsically linked to advancements in Computer Vision (CV) and Natural Language Processing (NLP) via Large Models (LMs). To investigate the possibility of current state-of-the-art (SOTA) LMs in VIA scenarios, we conduct a comprehensive study titled Visual Impaired Assistance with Large Models (VIALM). This study focuses on a task where a photo depicting the reality and linguistic question from a visually impaired user are provided to generate elaborate guidance aimed at assisting the user in fulfilling their request within that specific context. Our study, Visual Impaired Assistance with Large Models (VIALM), examines the application of SOTA LMs in VIA by providing an image and a textual request from a visually impaired user to generate detailed, context-specific guidance. Through benchmarking experiments and a review of recent LM advancements, our research indicates that while LMs have significant potential for enhancing VIA functionalities, they exhibit limitations such as insufficient environmental grounding (e.g., 2 5. 7 % failure rate in GPT-4) and inadequate detail in instructions (approximately 3 2. 1 % deficiency rate in GPT-4).

KW - artificial inteligence

KW - big data

KW - natural language processing

UR - http://www.scopus.com/inward/record.url?scp=105002255982&partnerID=8YFLogxK

U2 - 10.1109/SWC62898.2024.00253

DO - 10.1109/SWC62898.2024.00253

M3 - Conference Proceeding

AN - SCOPUS:105002255982

T3 - Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications

SP - 1645

EP - 1650

BT - Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 10th IEEE Smart World Congress, SWC 2024

Y2 - 2 December 2024 through 7 December 2024

ER -

Xiang R, Zhao Y, Zhang Y, Li J, Liao M, Li Y. Visually Impaired Assistance with Large Models. In Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications. Institute of Electrical and Electronics Engineers Inc. 2024. p. 1645-1650. (Proceedings - 2024 IEEE Smart World Congress, SWC 2024 - 2024 IEEE Ubiquitous Intelligence and Computing, Autonomous and Trusted Computing, Digital Twin, Metaverse, Privacy Computing and Data Security, Scalable Computing and Communications). doi: 10.1109/SWC62898.2024.00253

Visually Impaired Assistance with Large Models

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this