Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

被引:33
|
作者
Daghistani, Tahani [1 ]
Alshammari, Riyad [1 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci KSAU HS, King Abdullah Int Med Res Ctr KAIMRC, Coll Publ Hlth & Hlth Informat, Hlth Informat Dept,Minist Natl Guard Hlth Affairs, Riyadh, Saudi Arabia
关键词
diabetes; predictive model; machine learning; RandomForest; logistic regression;
D O I
10.12720/jait.11.2.78-83
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the global concerns in the healthcare domain and one of the leading challenges locally in Saudi Arabia. The prevalence of diabetes is anticipated to rise; early prediction of individuals at high risk of diabetes is a significant challenge. This study aims to compare RandomForest machine learning algorithm and Logistic Regression algorithm towards the prediction of diabetes. We analyzed 66,325 records that extracted from the Ministry of National Guard Hospital Affairs (MNGHA) databases in Saudi Arabia between 2013 and 2015. Both Machine Learning algorithms were applied to predict diabetes based on 18 risk factors. The evaluation criteria to compare the two algorithms were based on precision, Recall, True Positive rate, False Negative rate, F-measure and Area under the curve. The overall prevalence of diabetes in the data set is 64.47%. Male represents 55.50% of the data set while female represents 44.50%. For RandomForest (RF) model, the precision, Recall, True Positive Rate, False Positive Rate and F-measure value for predicting diabetes were 0.883, 0.88, 0.88, 0.188 and 0.876, respectively, while Logistic Regression model were only 0.692, 0.703, 0.703,0.454 and 0.675, respectively. Area under the ROC curve (AUC) value was 0.944 for the RF model and 0.708 for Logistic Regression model, which demonstrates higher predictive performance for RF than the Logistic Regression model. The RF algorithm showed superior prediction performance over Logistic Regression technique in predicting diabetes based on various matrices.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [21] COMPARISON OF MACHINE LEARNING TECHNIQUES FOR PREDICTING NLR PROTEINS
    Nadia
    Gandotra, Ekta
    Kumar, Narendra
    BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2023, 35 (02):
  • [22] Comparison of machine learning techniques for predicting porosity of chalk
    Nourani, Meysam
    Alali, Najeh
    Samadianfard, Saeed
    Band, Shahab S.
    Chau, Kwok-wing
    Shu, Chi-Min
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2022, 209
  • [23] Developing and microsimulating demographic dynamics for an integrated urban model: a comparison between logistic regression and machine learning techniques
    Khalil, Mohamad Ali
    Fatmi, Mahmudur Rahman
    Orvin, Muntahith
    TRANSPORTATION, 2024,
  • [24] Predicting Overweight and Obesity Status Among Malaysian Working Adults With Machine Learning or Logistic Regression: Retrospective Comparison Study
    Wong, Jyh Eiin
    Yamaguchi, Miwa
    Nishi, Nobuo
    Araki, Michihiro
    Wee, Lei Hum
    JMIR FORMATIVE RESEARCH, 2022, 6 (12)
  • [25] Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis
    Song, Xuan
    Liu, Xinyan
    Liu, Fei
    Wang, Chunting
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2021, 151
  • [26] Investigation of expert rule bases, logistic regression, and non-linear machine learning techniques for predicting response to antiretroviral treatment
    Prosperi, Mattia C. F.
    Altmann, Andre
    Rosen-Zvi, Michal
    Aharoni, Ehud
    Gabor Borgulya
    Fulop Bazso
    Sonnerborg, Anders
    Schuelter, Eugen
    Struck, Daniel
    Ulivi, Giovanni
    Vandamme, Anne-Mieke
    Vercauteren, Jurgen
    Zazzi, Maurizio
    ANTIVIRAL THERAPY, 2009, 14 (03) : 433 - 442
  • [27] Logistic Regression for Machine Learning in Process Tomography
    Rymarczyk, Tomasz
    Kozlowski, Edward
    Klosowski, Grzegorz
    Niderla, Konrad
    SENSORS, 2019, 19 (15)
  • [28] Comparison of Statistical and Machine Learning Techniques for Physical Layer Authentication
    Senigagliesi, Linda
    Baldi, Marco
    Gambi, Ennio
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 1506 - 1521
  • [29] Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes
    Yang, Chung-Chi
    Peng, Chung-Hsin
    Huang, Li-Ying
    Chen, Fang Yu
    Kuo, Chun-Heng
    Wu, Chung-Ze
    Hsia, Te-Lin
    Lin, Chung-Yu
    WORLD JOURNAL OF CLINICAL CASES, 2023, 11 (33)
  • [30] NONPARAMETRIC STATISTICAL ANALYSIS FOR MULTIPLE COMPARISON OF MACHINE LEARNING REGRESSION ALGORITHMS
    Trawinski, Bogdan
    Smetek, Magdalena
    Telec, Zbigniew
    Lasota, Tadeusz
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2012, 22 (04) : 867 - 881