An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values

被引:10
|
作者
Roy, Kumarmangal [1 ]
Ahmad, Muneer [1 ]
Waqar, Kinza [1 ]
Priyaah, Kirthanaah [1 ]
Nebhen, Jamel [2 ]
Alshamrani, Sultan S. [3 ]
Raza, Muhammad Ahsan [4 ]
Ali, Ihsan [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, POB 151, Alkharj 11942, Saudi Arabia
[3] Taif Univ, Dept Informat Technol, Coll Comp & Informat Technol, POB 11099, At Taif 21944, Saudi Arabia
[4] Bahauddin Zakariya Univ, Dept Informat Technol, Multan 60000, Pakistan
关键词
NEURAL-NETWORKS;
D O I
10.1155/2021/9953314
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Optimizing diabetes classification with a machine learning-based framework
    Xin Feng
    Yihuai Cai
    Ruihao Xin
    BMC Bioinformatics, 24
  • [42] Predicting Progression of Type 2 Diabetes Using Primary Care Data with the Help of Machine Learning
    Ozturk, Berk
    Lawton, Tom
    Smith, Stephen
    Habli, Ibrahim
    CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 : 38 - 42
  • [43] Machine Learning Techniques for Solving Classification Problems with Missing Input Data
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS, 2008, : 12 - +
  • [44] Evaluation of Machine Learning Classification Algorithms & Missing Data Imputation Techniques
    Nwulu, Nnamdi I.
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [45] Machine learning based study for the classification of Type 2 diabetes mellitus subtypes
    Nelson E. Ordoñez-Guillen
    Jose Luis Gonzalez-Compean
    Ivan Lopez-Arevalo
    Miguel Contreras-Murillo
    Edwin Aldana-Bobadilla
    BioData Mining, 16
  • [46] Machine learning based study for the classification of Type 2 diabetes mellitus subtypes
    Ordonez-Guillen, Nelson E.
    Gonzalez-Compean, Jose Luis
    Lopez-Arevalo, Ivan
    Contreras-Murillo, Miguel
    Aldana-Bobadilla, Edwin
    BIODATA MINING, 2023, 16 (01)
  • [47] Diabetes Type 2: Poincare Data Preprocessing for Quantum Machine Learning
    Sierra-Sosa, Daniel
    Arcila-Moreno, Juan D.
    Garcia-Zapirain, Begonya
    Elmaghraby, Adel
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (02): : 1849 - 1861
  • [48] Diagnosis of Type 2 Diabetes and Pre-diabetes Using Machine Learning
    Severeyn, Erika
    Wong, Sara
    Velasquez, Jesus
    Perpinan, Gilberto
    Herrera, Hector
    Altuve, Miguel
    Diaz, Jose
    VIII LATIN AMERICAN CONFERENCE ON BIOMEDICAL ENGINEERING AND XLII NATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING, 2020, 75 : 792 - 802
  • [49] Handling missing values and imbalanced classes in machine learning to predict consumer preference: Demonstrations and comparisons to prominent methods
    Liu, Yahui
    Li, Bin
    Yang, Shuai
    Li, Zhen
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [50] Handling missing values and imbalanced classes in machine learning to predict consumer preference: Demonstrations and comparisons to prominent methods
    Liu, Yahui
    Li, Bin
    Yang, Shuai
    Li, Zhen
    Expert Systems with Applications, 2024, 237