Prediction of Diabetes Using Data Mining and Machine Learning Algorithms: A Cross-Sectional Study

被引:2
|
作者
Shojaee-Mend, Hassan [1 ]
Velayati, Farnia [2 ]
Tayefi, Batool [3 ]
Babaee, Ebrahim [3 ,4 ,5 ]
机构
[1] Gonabad Univ Med Sci, Infect Dis Res Ctr, Gonabad, Iran
[2] Shahid Beheshti Univ Med Sci, Natl Res Inst TB & Lung Dis NRITLD, Telemed Res Ctr, Tehran, Iran
[3] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Med & Publ Hlth Res Ctr, Sch Med,Dept Community & Family Med, Tehran, Iran
[4] Iran Univ Med Sci, Vaccine Res Ctr, Tehran, Iran
[5] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Publ Hlth Res Ctr, POB 14665-354, Tehran 1449614535, Iran
关键词
Diabetes Mellitus; Machine Learning; Data Mining; Decision Trees; Risk Factors;
D O I
10.4258/hir.2024.30.1.73
中图分类号
R-058 [];
学科分类号
摘要
Objectives: This study aimed to develop a model to predict fasting blood glucose status using machine learning and data mining, since the early diagnosis and treatment of diabetes can improve outcomes and quality of life. Methods: This crosssectional study analyzed data from 3376 adults over 30 years old at 16 comprehensive health service centers in Tehran, Iran who participated in a diabetes screening program. The dataset was balanced using random sampling and the synthetic minority over-sampling technique (SMOTE). The dataset was split into training set (80%) and test set (20%). Shapley values were calculated to select the most important features. Noise analysis was performed by adding Gaussian noise to the numerical features to evaluate the robustness of feature importance. Five different machine learning algorithms, including CatBoost, random forest, XGBoost, logistic regression, and an artificial neural network, were used to model the dataset. Accuracy, sensitivity, specificity, accuracy, the F1-score, and the area under the curve were used to evaluate the model. Results: Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important factors for predicting fasting blood glucose status. Though the models achieved similar predictive ability, the CatBoost model performed slightly better overall with 0.737 area under the curve (AUC). Conclusions: A gradient boosted decision tree model accurately identified the most important risk factors related to diabetes. Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important risk factors for diabetes, respectively. This model can support planning for diabetes management and prevention.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [31] A Study of Disease Prediction on Weighted Symptom Data Using Deep Learning and Machine Learning Algorithms
    Colak, Melike
    Sivri, Talya Tumer
    Akman, Nergis Pervan
    Berkol, Ali
    Ekici, Yahya
    2022 INTERNATIONAL CONFERENCE ON THEORETICAL AND APPLIED COMPUTER SCIENCE AND ENGINEERING (ICTASCE), 2022, : 116 - 119
  • [32] AI Machine Learning-Based Diabetes Prediction in Older Adults in South Korea: Cross-Sectional Analysis
    Lee, Hocheol
    Park, Myung-Bae
    Won, Young-Joo
    JMIR FORMATIVE RESEARCH, 2025, 9
  • [33] A comparison of machine learning algorithms for diabetes prediction
    Khanam, Jobeda Jamal
    Foo, Simon Y.
    ICT EXPRESS, 2021, 7 (04): : 432 - 439
  • [34] Comparison of Machine Learning Algorithms for Prediction of Diabetes
    Costea, Naomi Estera
    Moisi, Elisa Valentina
    Popescu, Daniela Elena
    2021 16TH INTERNATIONAL CONFERENCE ON ENGINEERING OF MODERN ELECTRIC SYSTEMS (EMES), 2021, : 56 - 59
  • [35] Diagnosing growing pains in children by using machine learning: a cross-sectional multicenter study
    Akal, Fuat
    Batu, Ezgi D.
    Sonmez, Hafize Emine
    Karadag, Serife G.
    Demir, Ferhat
    Ayaz, Nuray Aktay
    Sozeri, Betul
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2022, 60 (12) : 3601 - 3614
  • [36] Prediction of Gestational Diabetes by Machine Learning Algorithms
    Gnanadass I.
    IEEE Potentials, 2020, 39 (06): : 32 - 37
  • [37] Diagnosing growing pains in children by using machine learning: a cross-sectional multicenter study
    Fuat Akal
    Ezgi D. Batu
    Hafize Emine Sonmez
    Şerife G. Karadağ
    Ferhat Demir
    Nuray Aktay Ayaz
    Betül Sözeri
    Medical & Biological Engineering & Computing, 2022, 60 : 3601 - 3614
  • [38] CLASSIFICATION OF FACIAL EXPRESSIONS USING DATA MINING AND MACHINE LEARNING ALGORITHMS
    Faria, Brigida Monica
    Lau, Nuno
    Reis, Luis Paulo
    SISTEMAS E TECHNOLOGIAS DE INFORMACAO: ACTAS DA 4A CONFERENCIA IBERICA DE SISTEMAS E TECNOLOGIAS DE LA INFORMACAO, 2009, : 197 - +
  • [39] A Comparative Study of Machine Learning Algorithms for Financial Data Prediction
    Omar, Bencharef
    Zineb, Bousbaa
    Jofre Aida, Cortes
    Cortes Daniel, Gonzalez
    2018 INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2018,
  • [40] A Cross-Sectional Machine Learning Approach for Hedge Fund Return Prediction and Selection
    Wu, Wenbo
    Chen, Jiaqi
    Yang, Zhibin
    Tindall, Michael L.
    MANAGEMENT SCIENCE, 2021, 67 (07) : 4577 - 4601