TY - GEN
T1 - Do High Metrics Equal Enhanced Cognitive Performance? Exploring Objective and Subjective Assessments in Digital Human Quality
AU - Wang, Jie
AU - Zhang, Di
AU - Wu, Qi
AU - Huang, Cai Wei
AU - Tan, Hong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Recent advancements in computer vision have driven the development of speech-driven 2D digital human lip-sync animation technology, which is now widely applied in fields such as animation, virtual idols, and online education. However, despite significant investments in improving relevant technical metrics, one crucial question remains underexplored: whether these optimizations truly enhance user cognition, as humans are the ultimate audience. To address this, we conducted a series of experiments by presenting digital human videos with varying technical quality levels and recording participants’ performance in information acquisition and cognitive tasks. Our goal was to assess whether higher technical metrics lead to better user outcomes. The results indicate that while higher technical metrics do improve information retention and visual clarity, they do not necessarily enhance subjective experience or learning outcomes, exhibiting a “plateau effect.” This finding highlights the need to integrate both objective and subjective evaluation criteria when optimizing speech-driven lip synchronization technology, particularly for educational contexts, and offers valuable guidance for the future development and optimization of this technology.
AB - Recent advancements in computer vision have driven the development of speech-driven 2D digital human lip-sync animation technology, which is now widely applied in fields such as animation, virtual idols, and online education. However, despite significant investments in improving relevant technical metrics, one crucial question remains underexplored: whether these optimizations truly enhance user cognition, as humans are the ultimate audience. To address this, we conducted a series of experiments by presenting digital human videos with varying technical quality levels and recording participants’ performance in information acquisition and cognitive tasks. Our goal was to assess whether higher technical metrics lead to better user outcomes. The results indicate that while higher technical metrics do improve information retention and visual clarity, they do not necessarily enhance subjective experience or learning outcomes, exhibiting a “plateau effect.” This finding highlights the need to integrate both objective and subjective evaluation criteria when optimizing speech-driven lip synchronization technology, particularly for educational contexts, and offers valuable guidance for the future development and optimization of this technology.
KW - 2D Digital Humans
KW - PSNR
KW - Speech-driven Animation
KW - SSIM
KW - User Perception
KW - Video Quality Assessment
UR - http://www.scopus.com/inward/record.url?scp=105003249089&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-4279-3_23
DO - 10.1007/978-981-96-4279-3_23
M3 - Conference Proceeding
AN - SCOPUS:105003249089
SN - 9789819642786
T3 - Communications in Computer and Information Science
SP - 320
EP - 333
BT - Digital Multimedia Communications - 21st International Forum on Digital TV and Wireless Multimedia Communications, IFTC 2024, 2024, Revised Selected Papers
A2 - Zhai, Guangtao
A2 - Zhou, Jun
A2 - Ye, Long
A2 - Yang, Hua
A2 - An, Ping
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21st International Forum on Digital TV and Wireless Multimedia Communications, IFTC 2024
Y2 - 28 November 2024 through 29 November 2024
ER -