UPDF AI

From Data to Decision: Explainable Risk Prediction for Cardiovascular Diseases Using Multicenter Patient Records

Jiaming Ou

2025 · DOI: 10.61173/c0yt3b70
0 Citations

TLDR

This study develops and compares three supervised learning models—Lasso-regularized logistic regression, random forest, and an ensemble model (Stacking)—for assessing individual disease risk, and demonstrates robust predictive performance across all models.

Abstract

Cardiovascular disease (CVD) remains one of the leading global causes of mortality, highlighting the critical need for early prediction to reduce fatality rates. This study utilizes a publicly available CVD dataset to develop and compare three supervised learning models—Lasso-regularized logistic regression, random forest, and an ensemble model (Stacking)—for assessing individual disease risk. Through comprehensive preprocessing, including interaction terms and dummy variable encoding, this research enhanced model expressiveness and feature representation. The experimental results demonstrate robust predictive performance across all models, with the Stacking ensemble achieving the highest accuracy (90.00%), surpassing logistic regression (87.78%) and random forest (89.44%). Feature importance analysis further reveals ST depression induced by exercise (Oldpeak), Slope of peak exercise ST segment (ST_slope), and maximum heart rate achieved during exercise (MaxHR) as the most influential predictors. These findings not only validate machine learning's effectiveness in CVD risk assessment but also emphasize the value of feature engineering and model assembling in boosting predictive accuracy. The study provides a reliable framework for clinical decision support, potentially enabling earlier interventions and improved patient outcomes.  

Cited Papers
Citing Papers