Performance Evaluation and Analysis of Machine Learning Algorithms for Predicting Periodontal Disease in Korean Adults
Eun-Seo Jung,Ki-Bong Choi,Hae-Young Kim
TLDR
Age, sex, and education level were identified as key predictors, significantly influencing model accuracy and highlighting the potential of machine learning-based prediction models in the early detection of periodontal disease and the development of personalized prevention strategies.
Abstract
Objectives: This study aimed to enhance the accuracy of predicting periodontal disease using machine learning algorithms and to identify key risk factors essential for developing personalized prevention and management strategies. Methods: Data from 11,781 adults aged 19 years or older were obtained from the 7th Korea National Health and Nutrition Examination Survey (2016–2018). Five machine learning algorithms, including logistic regression, decision tree, random forest, extreme gradient boosting, and CatBoost, were applied. Models were trained and evaluated using a complex sampling design and 10-fold cross-validation. Results: The prevalence of periodontal disease was 27.8%. The CatBoost model demonstrated the highest predictive performance (AUC: 0.760). Age, sex, and education level were identified as key predictors, significantly influencing model accuracy. Conclusions: This study highlights the potential of machine learning-based prediction models in the early detection of periodontal disease and the development of personalized prevention strategies.
