Prediction of Diabetes Using Data Mining and Machine Learning Algorithms: A Cross-Sectional Study

被引:2
|
作者
Shojaee-Mend, Hassan [1 ]
Velayati, Farnia [2 ]
Tayefi, Batool [3 ]
Babaee, Ebrahim [3 ,4 ,5 ]
机构
[1] Gonabad Univ Med Sci, Infect Dis Res Ctr, Gonabad, Iran
[2] Shahid Beheshti Univ Med Sci, Natl Res Inst TB & Lung Dis NRITLD, Telemed Res Ctr, Tehran, Iran
[3] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Med & Publ Hlth Res Ctr, Sch Med,Dept Community & Family Med, Tehran, Iran
[4] Iran Univ Med Sci, Vaccine Res Ctr, Tehran, Iran
[5] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Publ Hlth Res Ctr, POB 14665-354, Tehran 1449614535, Iran
关键词
Diabetes Mellitus; Machine Learning; Data Mining; Decision Trees; Risk Factors;
D O I
10.4258/hir.2024.30.1.73
中图分类号
R-058 [];
学科分类号
摘要
Objectives: This study aimed to develop a model to predict fasting blood glucose status using machine learning and data mining, since the early diagnosis and treatment of diabetes can improve outcomes and quality of life. Methods: This crosssectional study analyzed data from 3376 adults over 30 years old at 16 comprehensive health service centers in Tehran, Iran who participated in a diabetes screening program. The dataset was balanced using random sampling and the synthetic minority over-sampling technique (SMOTE). The dataset was split into training set (80%) and test set (20%). Shapley values were calculated to select the most important features. Noise analysis was performed by adding Gaussian noise to the numerical features to evaluate the robustness of feature importance. Five different machine learning algorithms, including CatBoost, random forest, XGBoost, logistic regression, and an artificial neural network, were used to model the dataset. Accuracy, sensitivity, specificity, accuracy, the F1-score, and the area under the curve were used to evaluate the model. Results: Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important factors for predicting fasting blood glucose status. Though the models achieved similar predictive ability, the CatBoost model performed slightly better overall with 0.737 area under the curve (AUC). Conclusions: A gradient boosted decision tree model accurately identified the most important risk factors related to diabetes. Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important risk factors for diabetes, respectively. This model can support planning for diabetes management and prevention.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [21] Mining Mixed Data Bases Using Machine Learning Algorithms
    Kuri-Morales, Angel
    PATTERN RECOGNITION, MCPR 2022, 2022, 13264 : 70 - 80
  • [22] Predicting the risk of hypertension using machine learning algorithms: A cross sectional study in Ethiopia
    Islam, Md. Merajul
    Alam, Md. Jahangir
    Maniruzzaman, Md
    Ahmed, N. A. M. Faisal
    Ali, Md Sujan
    Rahman, Md. Jahanur
    Roy, Dulal Chandra
    PLOS ONE, 2023, 18 (08):
  • [23] Ocular biometric parameters in Chinese preschool children and physiological axial length growth prediction using machine learning algorithms: a retrospective cross-sectional study
    Liu, Duanke
    Zhao, Heng
    Tang, Tao
    Li, Xuewei
    Shi, Xiaoqing
    Ma, Jiahui
    Zhou, Jingwei
    Zhao, Chenxu
    Li, Yan
    Wang, Kai
    Zhao, Mingwei
    BMJ OPEN, 2024, 14 (12):
  • [24] Identifying Predictors of Neck Disability in Patients with Cervical Pain Using Machine Learning Algorithms: A Cross-Sectional Correlational Study
    Torad, Ahmed A.
    Ahmed, Mohamed M.
    Elabd, Omar M.
    El-Shamy, Fayiz F.
    Alajam, Ramzi A.
    Amin, Wafaa Mahmoud
    Alfaifi, Bsmah H.
    Elabd, Aliaa M.
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (07)
  • [25] Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults
    Xiao-lu Xiong
    Rong-xin Zhang
    Yan Bi
    Wei-hong Zhou
    Yun Yu
    Da-long Zhu
    Current Medical Science, 2019, 39 : 582 - 588
  • [26] Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults
    Xiong, Xiao-lu
    Zhang, Rong-xin
    Bi, Yan
    Zhou, Wei-hong
    Yu, Yun
    Zhu, Da-long
    CURRENT MEDICAL SCIENCE, 2019, 39 (04) : 582 - 588
  • [27] Performance of different machine learning algorithms in identifying undiagnosed diabetes based on nonlaboratory parameters and the influence of muscle strength: A cross-sectional study
    Xu, Ying
    Qiu, Shanhu
    Ye, Jinli
    Chen, Dan
    Wang, Donglei
    Zhou, Xiaoying
    Sun, Zilin
    JOURNAL OF DIABETES INVESTIGATION, 2024, 15 (06) : 743 - 750
  • [28] The Application of Machine Learning Algorithms in Data Mining
    Zhang, Wei
    2016 INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING AND COMMUNICATIONS TECHNOLOGY (IECT 2016), 2016, : 521 - 527
  • [29] Machine learning algorithms to predict mild cognitive impairment in older adults in China: A cross-sectional study
    Song, Yanliqing
    Yuan, Quan
    Liu, Haoqiang
    Gu, KeNan
    Liu, Yue
    JOURNAL OF AFFECTIVE DISORDERS, 2025, 368 : 117 - 126
  • [30] An Outcome Based Analysis on Heart Disease Prediction using Machine Learning Algorithms and Data Mining Approaches
    Deb, Aushtmi
    Koli, Mst Sadia Akter
    Akter, Sheikh Beauty
    Chowdhury, Adil Ahmed
    2022 IEEE WORLD AI IOT CONGRESS (AIIOT), 2022, : 418 - 424