A practical framework for early detection of diabetes using ensemble machine learning models

被引:6
|
作者
Saihood, Qusay [1 ]
Sonuc, Emrullah [1 ]
机构
[1] Karabuk Univ, Dept Comp Engn, Karabuk, Turkiye
关键词
Machine learning; ensemble learning; diabetes diagnosis; classification; ARTIFICIAL-INTELLIGENCE; PREDICTION;
D O I
10.55730/1300-0632.4013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diagnosis of diabetes, a prevalent global health condition, is crucial for preventing severe complications. In recent years, there has been a growing effort to develop intelligent diagnostic systems for diabetes utilizing machine learning (ML) algorithms. Despite these efforts, achieving high accuracy rates using such systems remains a significant challenge. Recent advancements in ensemble ML methods offer promising opportunities for early detection of diabetes, as they are known to be faster and more cost-effective than traditional approaches. Therefore, this study proposes a practical framework for diagnosing diabetes that involves three stages. The data preprocessing stage encompasses several crucial tasks, including handling missing values, identifying outliers, balancing the data, normalizing the data, and selecting relevant features. Subsequently, the hyperparameters of the ML algorithms are fine-tuned using grid search to improve their performance. In the final stage, the framework employs ensemble techniques such as bagging, boosting, and stacking to combine multiple ML algorithms and further enhance their predictive capability. Pima Indians Diabetes Database open-access dataset was used to test the performance of the proposed models. The experimental results of this framework indicate the superiority of ensemble methods in diagnosing diabetes compared to individual ML models. The stacking method achieved the best accuracy among the ensemble methods, with the stacked random forest (RF) and support vector machine (SVM) model attaining an accuracy of 97.50%. Among the bagging methods, the RF model yielded the highest accuracy, while among the boosting methods, eXtreme Gradient Boosting (XGB) model achieved the highest accuracy rates of 97.20% and 97.10%, respectively. Moreover, our proposed framework outperforms other ML models as confirmed by the comparison. The study has demonstrated that ensemble methods are crucial for accurate diabetes diagnosis, enabling early detection through efficient preprocessing and calibrated models.
引用
收藏
页码:722 / 738
页数:18
相关论文
共 50 条
  • [21] Regularized ensemble neural networks models in the Extreme Learning Machine framework
    Perales-Gonzalez, Carlos
    Carbonero-Ruz, Mariano
    Becerra-Alonso, David
    Perez-Rodriguez, Javier
    Fernandez-Navarro, Francisco
    NEUROCOMPUTING, 2019, 361 : 196 - 211
  • [22] Machine Learning Models and Applications for Early Detection
    Zapata-Cortes, Orlando
    Arango-Serna, Martin Dario
    Zapata-Cortes, Julian Andres
    Restrepo-Carmona, Jaime Alonso
    SENSORS, 2024, 24 (14)
  • [23] Somatic Mutation Detection Using Ensemble of Machine Learning
    Yu, Xingyu
    Li, Xiang
    Tong, Jijun
    Yang, Bin
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT II, ICIC 2024, 2024, 14882 : 444 - 453
  • [24] Machine learning based framework for network intrusion detection system using stacking ensemble technique
    Parashar, Anshu
    Saggu, Kuljot Singh
    Garg, Anupam
    INDIAN JOURNAL OF ENGINEERING AND MATERIALS SCIENCES, 2022, 29 (04) : 509 - 518
  • [25] Fake News Detection Using Ensemble Machine Learning
    Mohale, Potsane
    Leung, Wai Sze
    PROCEEDINGS OF THE 18TH EUROPEAN CONFERENCE ON CYBER WARFARE AND SECURITY (ECCWS 2019), 2019, : 777 - 784
  • [26] Detection of all-cause advanced hepatic fibrosis using an ensemble machine learning framework
    Cross, Timothy J. S.
    LANCET DIGITAL HEALTH, 2022, 4 (03): : E152 - E153
  • [27] Disease Detection Using Ensemble Model in Machine Learning
    Rojalin Mohapatra
    Parimala Kumar Giri
    Irfan Sayyad
    Amaresh Sahu
    Biswajit Brahma
    Nilayam Kumar Kamila
    SN Computer Science, 6 (3)
  • [28] Early prediction of postpartum dyslipidemia in gestational diabetes using machine learning models
    Jiang, Zhifa
    Chen, Xiekun
    Lai, Yuhang
    Liu, Jingwen
    Ye, Xiangyun
    Chen, Ping
    Zhang, Zhen
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [29] Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes
    Iparraguirre-Villanueva, Orlando
    Espinola-Linares, Karina
    Castaneda, Rosalynn Ornella Flores
    Cabanillas-Carbonell, Michael
    DIAGNOSTICS, 2023, 13 (14)
  • [30] An Ensemble Machine Learning Botnet Detection Framework Based on Noise Filtering
    Liu, Tzong-Jye
    Lin, Tze-Shiun
    Chen, Ching-Wen
    JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (06): : 1347 - 1357