Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

被引:33
|
作者
Daghistani, Tahani [1 ]
Alshammari, Riyad [1 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci KSAU HS, King Abdullah Int Med Res Ctr KAIMRC, Coll Publ Hlth & Hlth Informat, Hlth Informat Dept,Minist Natl Guard Hlth Affairs, Riyadh, Saudi Arabia
关键词
diabetes; predictive model; machine learning; RandomForest; logistic regression;
D O I
10.12720/jait.11.2.78-83
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the global concerns in the healthcare domain and one of the leading challenges locally in Saudi Arabia. The prevalence of diabetes is anticipated to rise; early prediction of individuals at high risk of diabetes is a significant challenge. This study aims to compare RandomForest machine learning algorithm and Logistic Regression algorithm towards the prediction of diabetes. We analyzed 66,325 records that extracted from the Ministry of National Guard Hospital Affairs (MNGHA) databases in Saudi Arabia between 2013 and 2015. Both Machine Learning algorithms were applied to predict diabetes based on 18 risk factors. The evaluation criteria to compare the two algorithms were based on precision, Recall, True Positive rate, False Negative rate, F-measure and Area under the curve. The overall prevalence of diabetes in the data set is 64.47%. Male represents 55.50% of the data set while female represents 44.50%. For RandomForest (RF) model, the precision, Recall, True Positive Rate, False Positive Rate and F-measure value for predicting diabetes were 0.883, 0.88, 0.88, 0.188 and 0.876, respectively, while Logistic Regression model were only 0.692, 0.703, 0.703,0.454 and 0.675, respectively. Area under the ROC curve (AUC) value was 0.944 for the RF model and 0.708 for Logistic Regression model, which demonstrates higher predictive performance for RF than the Logistic Regression model. The RF algorithm showed superior prediction performance over Logistic Regression technique in predicting diabetes based on various matrices.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [31] Comparison between Machine Learning Algorithms in the Predicting the Onset of Diabetes
    Abed, Mahmood
    Ibrikci, Turgay
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [32] Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model
    Belsti, Yitayeh
    Moran, Lisa
    Du, Lan
    Mousa, Aya
    De Silva, Kushan
    Enticott, Joanne
    Teede, Helena
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 179
  • [33] Genetic biomarkers and machine learning techniques for predicting diabetes: systematic review
    Khan, Sulaiman
    Mohsen, Farida
    Shah, Zubair
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (02)
  • [34] Predicting Suicidal Behaviors in Individuals With Diabetes Using Machine Learning Techniques
    Mamun, Mohammed A.
    Al-Mamun, Firoj
    Hasan, Md Emran
    Roy, Nitai
    ALmerab, Moneerah Mohammad
    Muhit, Mohammad
    Moonajilin, Mst. Sabrina
    PERSPECTIVES IN PSYCHIATRIC CARE, 2024, 2024
  • [35] Predicting Daily Mean Solar Power Using Machine Learning Regression Techniques
    Jawaid, Faizan
    NazirJunejo, Khurum
    2016 SIXTH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2016, : 355 - 360
  • [36] A Performance Comparison of Statistical and Machine Learning Techniques in Learning Time Series Data
    Haviluddin
    Alfred, Rayner
    Obit, Joe Henry
    Hijazi, Mohd Hanafi Ahmad
    Ibrahim, Ag Asri Ag
    ADVANCED SCIENCE LETTERS, 2015, 21 (10) : 3037 - 3041
  • [37] Comparison Of Statistical Tests In Logistic Regression: The Case Of Hypernatreamia
    Katsaragakis, Stylianos
    Koukouvinos, Christos
    Stylianou, Stella
    Theodoraki, Eleni-Maria
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2005, 4 (02) : 514 - 521
  • [38] A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population
    Chowdhury, Mohammad Ziaul Islam
    Leung, Alexander A. A.
    Walker, Robin L. L.
    Sikdar, Khokan C. C.
    O'Beirne, Maeve
    Quan, Hude
    Turin, Tanvir C. C.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [39] A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population
    Mohammad Ziaul Islam Chowdhury
    Alexander A. Leung
    Robin L. Walker
    Khokan C. Sikdar
    Maeve O’Beirne
    Hude Quan
    Tanvir C. Turin
    Scientific Reports, 13
  • [40] Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches
    Stylianou, Neophytos
    Akbarov, Artur
    Kontopantelis, Evangelos
    Buchan, Iain
    Dunn, Ken W.
    BURNS, 2015, 41 (05) : 925 - 934