TY - JOUR
T1 - Deep Phenotyping and Prediction of Long-term Cardiovascular Disease
T2 - Optimized by Machine Learning
AU - Zhuang, Xiao dong
AU - Tian, Ting
AU - Liao, Li zhen
AU - Dong, Yue hua
AU - Zhou, Hao jin
AU - Zhang, Shao zhao
AU - Chen, Wen yi
AU - Du, Zhi min
AU - Wang, Xue qin
AU - Liao, Xin xue
N1 - Publisher Copyright:
© 2022 Canadian Cardiovascular Society
PY - 2022/6
Y1 - 2022/6
N2 - Background: Prediction of cardiovascular disease (CVD) is important in clinical practice. Machine learning (ML) may offer an improved alternative to current CVD risk stratification in individual patients. We aim to identify important predictors and compare ML models with traditional models according to their prediction performance in a large long-term follow-up cohort. Methods: The Atherosclerosis Risk in Communities (ARIC) study was designed to study the progression of subclinical disease to cardiovascular events over a 25-year follow-up period. All phenotypic variables at visit 1 were obtained. All-cause death, CVD, and coronary heart disease were the outcomes for analysis. The ML framework involved variable selection using the random survival forest (RSF) method, model building, and 5-fold cross-validation. Model performance was evaluated by discrimination using the Harrell concordance index (C-index), accuracy using the Brier score (BS), and interpretability using the number of variables in the model. Results: Of the 14,842 participants in ARIC, the average age was 54.2 years, with 45.2% male and 26.2% Black participants. Thirty-eight unique variables were selected in the RSF top 20 importance ranking of all 6 outcomes. Aging, hypertension, glucose metabolism, renal function, coagulation, adiposity, and sodium retention dominated the predictions of all outcomes. The ML models outperformed the regression models and established risk scores with a higher C-index, lower BS, and varied interpretability. Conclusions: The ML framework is useful for identifying important predictors of CVD and for developing models with robust performance compared with existing risk models.
AB - Background: Prediction of cardiovascular disease (CVD) is important in clinical practice. Machine learning (ML) may offer an improved alternative to current CVD risk stratification in individual patients. We aim to identify important predictors and compare ML models with traditional models according to their prediction performance in a large long-term follow-up cohort. Methods: The Atherosclerosis Risk in Communities (ARIC) study was designed to study the progression of subclinical disease to cardiovascular events over a 25-year follow-up period. All phenotypic variables at visit 1 were obtained. All-cause death, CVD, and coronary heart disease were the outcomes for analysis. The ML framework involved variable selection using the random survival forest (RSF) method, model building, and 5-fold cross-validation. Model performance was evaluated by discrimination using the Harrell concordance index (C-index), accuracy using the Brier score (BS), and interpretability using the number of variables in the model. Results: Of the 14,842 participants in ARIC, the average age was 54.2 years, with 45.2% male and 26.2% Black participants. Thirty-eight unique variables were selected in the RSF top 20 importance ranking of all 6 outcomes. Aging, hypertension, glucose metabolism, renal function, coagulation, adiposity, and sodium retention dominated the predictions of all outcomes. The ML models outperformed the regression models and established risk scores with a higher C-index, lower BS, and varied interpretability. Conclusions: The ML framework is useful for identifying important predictors of CVD and for developing models with robust performance compared with existing risk models.
UR - http://www.scopus.com/inward/record.url?scp=85129240318&partnerID=8YFLogxK
U2 - 10.1016/j.cjca.2022.02.008
DO - 10.1016/j.cjca.2022.02.008
M3 - Article
C2 - 35157988
AN - SCOPUS:85129240318
SN - 0828-282X
VL - 38
SP - 774
EP - 782
JO - Canadian Journal of Cardiology
JF - Canadian Journal of Cardiology
IS - 6
ER -