An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values

被引:10
|
作者
Roy, Kumarmangal [1 ]
Ahmad, Muneer [1 ]
Waqar, Kinza [1 ]
Priyaah, Kirthanaah [1 ]
Nebhen, Jamel [2 ]
Alshamrani, Sultan S. [3 ]
Raza, Muhammad Ahsan [4 ]
Ali, Ihsan [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, POB 151, Alkharj 11942, Saudi Arabia
[3] Taif Univ, Dept Informat Technol, Coll Comp & Informat Technol, POB 11099, At Taif 21944, Saudi Arabia
[4] Bahauddin Zakariya Univ, Dept Informat Technol, Multan 60000, Pakistan
关键词
NEURAL-NETWORKS;
D O I
10.1155/2021/9953314
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.
引用
收藏
页数:21
相关论文
共 50 条
  • [11] Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning
    Tyagi, Shivani
    Mittal, Sangeeta
    PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 209 - 221
  • [12] An improved weighted extreme learning machine for imbalanced data classification
    Lu, Chengbo
    Ke, Haifeng
    Zhang, Gaoyan
    Mei, Ying
    Xu, Huihui
    MEMETIC COMPUTING, 2019, 11 (01) : 27 - 34
  • [13] An improved weighted extreme learning machine for imbalanced data classification
    Chengbo Lu
    Haifeng Ke
    Gaoyan Zhang
    Ying Mei
    Huihui Xu
    Memetic Computing, 2019, 11 : 27 - 34
  • [14] Evaluating Machine Learning Classification Using Sorted Missing Percentage Technique Based on Missing Data
    Hung, Che-Yu
    Jiang, Bernard C.
    Wang, Chien-Chih
    APPLIED SCIENCES-BASEL, 2020, 10 (14):
  • [15] Enhanced automatic twin support vector machine for imbalanced data classification
    Jimenez-Castano, C.
    Alvarez-Meza, A.
    Orozco-Gutierrez, A.
    PATTERN RECOGNITION, 2020, 107
  • [16] Classification of Diabetes using Machine Learning
    Ul Islam, Nair
    Khanam, Ruqaiya
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 185 - +
  • [17] The Imbalanced Classification of Fraudulent Bank Transactions Using Machine Learning
    Ruchay, Alexey
    Feldman, Elena
    Cherbadzhi, Dmitriy
    Sokolov, Alexander
    MATHEMATICS, 2023, 11 (13)
  • [18] Classification of imbalanced medical data: An empirical study of machine learning approaches
    Mundra, Shikha
    Vijay, Shounak
    Mundra, Ankit
    Gupta, Punit
    Goyal, Mayank Kumar
    Kaur, Mandeep
    Khaitan, Supriya
    Rajpoot, Abha Kiran
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (02) : 1933 - 1946
  • [19] Dual weighted extreme learning machine for imbalanced data stream classification
    Zhang, Yong
    Liu, Wenzhe
    Ren, Xuezhen
    Ren, Yonggong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 33 (02) : 1143 - 1154
  • [20] Diabetes type 2 classification using machine learning algorithms with up-sampling technique
    Mariwan Ahmed Hama Saeed
    Journal of Electrical Systems and Information Technology, 10 (1)