Deep Phenotyping and Prediction of Long-term Cardiovascular Disease: Optimized by Machine Learning

Xiao dong Zhuang, Ting Tian, Li zhen Liao, Yue hua Dong, Hao jin Zhou, Shao zhao Zhang, Wen yi Chen, Zhi min Du, Xue qin Wang*, Xin xue Liao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


Background: Prediction of cardiovascular disease (CVD) is important in clinical practice. Machine learning (ML) may offer an improved alternative to current CVD risk stratification in individual patients. We aim to identify important predictors and compare ML models with traditional models according to their prediction performance in a large long-term follow-up cohort. Methods: The Atherosclerosis Risk in Communities (ARIC) study was designed to study the progression of subclinical disease to cardiovascular events over a 25-year follow-up period. All phenotypic variables at visit 1 were obtained. All-cause death, CVD, and coronary heart disease were the outcomes for analysis. The ML framework involved variable selection using the random survival forest (RSF) method, model building, and 5-fold cross-validation. Model performance was evaluated by discrimination using the Harrell concordance index (C-index), accuracy using the Brier score (BS), and interpretability using the number of variables in the model. Results: Of the 14,842 participants in ARIC, the average age was 54.2 years, with 45.2% male and 26.2% Black participants. Thirty-eight unique variables were selected in the RSF top 20 importance ranking of all 6 outcomes. Aging, hypertension, glucose metabolism, renal function, coagulation, adiposity, and sodium retention dominated the predictions of all outcomes. The ML models outperformed the regression models and established risk scores with a higher C-index, lower BS, and varied interpretability. Conclusions: The ML framework is useful for identifying important predictors of CVD and for developing models with robust performance compared with existing risk models.

Original languageEnglish
Pages (from-to)774-782
Number of pages9
JournalCanadian Journal of Cardiology
Issue number6
Publication statusPublished - Jun 2022
Externally publishedYes


Dive into the research topics of 'Deep Phenotyping and Prediction of Long-term Cardiovascular Disease: Optimized by Machine Learning'. Together they form a unique fingerprint.

Cite this