TY - JOUR
T1 - Dietary Assessment With Multimodal ChatGPT
T2 - A Systematic Analysis
AU - Lo, Frank P.W.
AU - Qiu, Jianing
AU - Wang, Zeyu
AU - Chen, Junhong
AU - Xiao, Bo
AU - Yuan, Wu
AU - Giannarou, Stamatia
AU - Frost, Gary
AU - Lo, Benny
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Conventional approaches to dietary assessment are primarily grounded in self-reporting methods or structured interviews conducted under the supervision of dietitians. These methods, however, are often subjective, inaccurate, and time-intensive. Although artificial intelligence (AI)-based solutions have been devised to automate the dietary assessment process, prior AI methodologies tackle dietary assessment in a fragmented landscape (e.g., merely recognizing food types or estimating portion size) and encounter challenges in their ability to generalize across a diverse range of food categories, dietary behaviors, and cultural contexts. Recently, the emergence of multimodal foundation models, such as GPT-4V, has exhibited transformative potential across a wide range of tasks in various research domains. These models have demonstrated remarkable generalist intelligence and accuracy, owing to their large-scale pre-training on broad datasets and substantially scaled model size. In this study, we explore the application of GPT-4V powering multimodal ChatGPT for dietary assessment, along with prompt engineering and passive monitoring techniques. We evaluated the proposed pipeline using a self-collected, semi free-living dietary intake dataset, captured through wearable cameras. Our findings reveal that GPT-4V excels in food detection under challenging conditions without any fine-tuning or adaptation using food-specific datasets. By guiding the model with specific language prompts (e.g., African cuisine), it shifts from recognizing common staples like rice and bread to accurately identifying regional dishes like banku and ugali. Another standout feature of GPT-4V is its contextual awareness. GPT-4V can leverage surrounding objects as scale references to deduce the portion sizes of food items, further facilitating the process of dietary assessment.
AB - Conventional approaches to dietary assessment are primarily grounded in self-reporting methods or structured interviews conducted under the supervision of dietitians. These methods, however, are often subjective, inaccurate, and time-intensive. Although artificial intelligence (AI)-based solutions have been devised to automate the dietary assessment process, prior AI methodologies tackle dietary assessment in a fragmented landscape (e.g., merely recognizing food types or estimating portion size) and encounter challenges in their ability to generalize across a diverse range of food categories, dietary behaviors, and cultural contexts. Recently, the emergence of multimodal foundation models, such as GPT-4V, has exhibited transformative potential across a wide range of tasks in various research domains. These models have demonstrated remarkable generalist intelligence and accuracy, owing to their large-scale pre-training on broad datasets and substantially scaled model size. In this study, we explore the application of GPT-4V powering multimodal ChatGPT for dietary assessment, along with prompt engineering and passive monitoring techniques. We evaluated the proposed pipeline using a self-collected, semi free-living dietary intake dataset, captured through wearable cameras. Our findings reveal that GPT-4V excels in food detection under challenging conditions without any fine-tuning or adaptation using food-specific datasets. By guiding the model with specific language prompts (e.g., African cuisine), it shifts from recognizing common staples like rice and bread to accurately identifying regional dishes like banku and ugali. Another standout feature of GPT-4V is its contextual awareness. GPT-4V can leverage surrounding objects as scale references to deduce the portion sizes of food items, further facilitating the process of dietary assessment.
KW - ChatGPT
KW - deep learning
KW - dietary assessment
KW - food recognition
KW - foundation model
KW - GPT-4V
KW - passive monitoring
UR - http://www.scopus.com/inward/record.url?scp=85196728157&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2024.3417280
DO - 10.1109/JBHI.2024.3417280
M3 - Article
C2 - 38900623
AN - SCOPUS:85196728157
SN - 2168-2194
VL - 28
SP - 7577
EP - 7587
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 12
ER -