TY - JOUR
T1 - Artificial intelligence-enabled prenatal ultrasound for the detection of fetal cardiac abnormalities
T2 - a systematic review and meta-analysis
AU - D'Alberti, Elena
AU - Patey, Olga
AU - Smith, Carolyn
AU - Šalović, Bojana
AU - Hernandez-Cruz, Netzahualcoyotl
AU - Noble, J. Alison
AU - Papageorghiou, Aris T.
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/6
Y1 - 2025/6
N2 - Background: Advances in artificial intelligence (AI) have triggered interest in using intelligent systems to improve prenatal detection of fetal congenital heart defects (CHDs). Our aim is to systematically examine the current literature on diagnostic performance of AI-enabled prenatal cardiac ultrasound. Methods: This systematic review and meta-analysis was registered with PROSPERO (CRD42024549601). Embase, Medline, Cochrane Central Database of Controlled Trials, and CINAHL were searched from inception until February 2025. Studies evaluating AI performance in prenatal detection of fetal CHDs were eligible for inclusion, and studies focusing on the application of AI before 16 weeks of gestation, or using three- or four-dimensional ultrasound, were excluded. Pooled sensitivity and specificity were obtained using random-effect method, and pooled proportions using the Freeman-Tukey arcsine square root transformation. Heterogeneity was assessed with I2 statistics. Risk of bias and adherence to reporting standards were assessed using QUADAS-2 and TRIPOD+AI, respectively. Risk of publication bias was assessed with Deek's test and certainty of evidence for outcomes with GRADE approach. Findings: Fifteen studies were included, of which fourteen developed and evaluated a model and one externally evaluated a previously trained model. Images and videos obtained during cardiac screening or fetal echocardiography of 30.121 fetuses were used for training, validation and testing. For the binary task of classifying heart as normal or abnormal, AI models achieved a pooled sensitivity of 0.89 (95% CI 0.83–0.93, I2 = 77.92%) and specificity of 0.91 (95% CI 0.84–0.95, I2 = 77.92%). The subgroup analysis showed that models tested on various CHDs exhibited lower sensitivity compared to those tested for a specific cardiac abnormality (0.85; 95% CI 0.75–0.91 vs 0.92; 95% CI 0.87–0.96), while specificity remained comparable (0.90; 95% CI 0.79–0.96 vs 0.91; 95% CI 0.81–0.97). Overall, AI models performed better than operators with lower expertise and were nearly comparable to experts; however, the human comparator group (median six clinicians, IQR 3–10) was usually small and non-blinded. Relevant sources of heterogeneity were the types of cardiac views collected, the prevalence of CHDs across different datasets, and the types of CHDs examined. The risk of bias was moderate-high and adherence to reporting standards low (>70% in 18/51 TRIPOD+AI items). The risk of publication bias was not statistically significant (Deek's test p = 0.474). Interpretation: These findings suggest that AI models perform better than clinicians with lower expertise, but this must be interpreted with caution due to the high risk of bias and sources of heterogeneity. Funding: This study was partly supported by the InnoHK-funded Hong Kong Centre for Cerebro-cardiovascular Health Engineering (COCHE) Project 2.1 (Cardiovascular risks in early life and fetal echocardiography). ATP and JAN are supported by the National Institute for Health and Care Research (NIHR) Oxford Biomedical Research Centre (BRC).
AB - Background: Advances in artificial intelligence (AI) have triggered interest in using intelligent systems to improve prenatal detection of fetal congenital heart defects (CHDs). Our aim is to systematically examine the current literature on diagnostic performance of AI-enabled prenatal cardiac ultrasound. Methods: This systematic review and meta-analysis was registered with PROSPERO (CRD42024549601). Embase, Medline, Cochrane Central Database of Controlled Trials, and CINAHL were searched from inception until February 2025. Studies evaluating AI performance in prenatal detection of fetal CHDs were eligible for inclusion, and studies focusing on the application of AI before 16 weeks of gestation, or using three- or four-dimensional ultrasound, were excluded. Pooled sensitivity and specificity were obtained using random-effect method, and pooled proportions using the Freeman-Tukey arcsine square root transformation. Heterogeneity was assessed with I2 statistics. Risk of bias and adherence to reporting standards were assessed using QUADAS-2 and TRIPOD+AI, respectively. Risk of publication bias was assessed with Deek's test and certainty of evidence for outcomes with GRADE approach. Findings: Fifteen studies were included, of which fourteen developed and evaluated a model and one externally evaluated a previously trained model. Images and videos obtained during cardiac screening or fetal echocardiography of 30.121 fetuses were used for training, validation and testing. For the binary task of classifying heart as normal or abnormal, AI models achieved a pooled sensitivity of 0.89 (95% CI 0.83–0.93, I2 = 77.92%) and specificity of 0.91 (95% CI 0.84–0.95, I2 = 77.92%). The subgroup analysis showed that models tested on various CHDs exhibited lower sensitivity compared to those tested for a specific cardiac abnormality (0.85; 95% CI 0.75–0.91 vs 0.92; 95% CI 0.87–0.96), while specificity remained comparable (0.90; 95% CI 0.79–0.96 vs 0.91; 95% CI 0.81–0.97). Overall, AI models performed better than operators with lower expertise and were nearly comparable to experts; however, the human comparator group (median six clinicians, IQR 3–10) was usually small and non-blinded. Relevant sources of heterogeneity were the types of cardiac views collected, the prevalence of CHDs across different datasets, and the types of CHDs examined. The risk of bias was moderate-high and adherence to reporting standards low (>70% in 18/51 TRIPOD+AI items). The risk of publication bias was not statistically significant (Deek's test p = 0.474). Interpretation: These findings suggest that AI models perform better than clinicians with lower expertise, but this must be interpreted with caution due to the high risk of bias and sources of heterogeneity. Funding: This study was partly supported by the InnoHK-funded Hong Kong Centre for Cerebro-cardiovascular Health Engineering (COCHE) Project 2.1 (Cardiovascular risks in early life and fetal echocardiography). ATP and JAN are supported by the National Institute for Health and Care Research (NIHR) Oxford Biomedical Research Centre (BRC).
KW - Artificial intelligence
KW - Congenital heart defect
KW - Diagnostic accuracy
KW - Echocardiography
KW - Fetal ultrasound
UR - https://www.scopus.com/pages/publications/105006945311
U2 - 10.1016/j.eclinm.2025.103250
DO - 10.1016/j.eclinm.2025.103250
M3 - Article
AN - SCOPUS:105006945311
SN - 2589-5370
VL - 84
JO - eClinicalMedicine
JF - eClinicalMedicine
M1 - 103250
ER -