Predictive models for diabetes mellitus using machine learning techniques

被引:108
|
作者
Lai, Hang [1 ,2 ]
Huang, Huaxiong [1 ,2 ]
Keshavjee, Karim [2 ,3 ]
Guergachi, Aziz [1 ,2 ,4 ]
Gao, Xin [1 ,2 ]
机构
[1] York Univ, Dept Math & Stat, 4700 Keele St, Toronto, ON M3J 1P3, Canada
[2] Ctr Quantitat Anal & Modelling CQAM Lab, Fields Inst Res Math Sci, 222 Coll St, Toronto, ON M5T 3J1, Canada
[3] Univ Toronto, Inst Hlth Policy Management & Evaluat, 155 Coll St,Suite 425, Toronto, ON M5T 3M6, Canada
[4] Ryerson Univ, Ted Rogers Sch Management Informat Technol Manage, 350 Victoria St, Toronto, ON M5B 2K3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Diabetes mellitus; Machine learning; Gradient boosting machine; Predictive models; Misclassification cost; RISK; PERFORMANCE; ADULTS; SCORE;
D O I
10.1186/s12902-019-0436-6
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Diabetes Mellitus is an increasingly prevalent chronic disease characterized by the body's inability to metabolize glucose. The objective of this study was to build an effective predictive model with high sensitivity and selectivity to better identify Canadian patients at risk of having Diabetes Mellitus based on patient demographic data and the laboratory results during their visits to medical facilities. Methods Using the most recent records of 13,309 Canadian patients aged between 18 and 90 years, along with their laboratory information (age, sex, fasting blood glucose, body mass index, high-density lipoprotein, triglycerides, blood pressure, and low-density lipoprotein), we built predictive models using Logistic Regression and Gradient Boosting Machine (GBM) techniques. The area under the receiver operating characteristic curve (AROC) was used to evaluate the discriminatory capability of these models. We used the adjusted threshold method and the class weight method to improve sensitivity - the proportion of Diabetes Mellitus patients correctly predicted by the model. We also compared these models to other learning machine techniques such as Decision Tree and Random Forest. Results The AROC for the proposed GBM model is 84.7% with a sensitivity of 71.6% and the AROC for the proposed Logistic Regression model is 84.0% with a sensitivity of 73.4%. The GBM and Logistic Regression models perform better than the Random Forest and Decision Tree models. Conclusions The ability of our model to predict patients with Diabetes using some commonly used lab results is high with satisfactory sensitivity. These models can be built into an online computer program to help physicians in predicting patients with future occurrence of diabetes and providing necessary preventive interventions. The model is developed and validated on the Canadian population which is more specific and powerful to apply on Canadian patients than existing models developed from US or other populations. Fasting blood glucose, body mass index, high-density lipoprotein, and triglycerides were the most important predictors in these models.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] The early prediction of gestational diabetes mellitus by machine learning models
    Kaya, Yeliz
    Butun, Zafer
    Celik, Ozer
    Salik, Ece Akca
    Tahta, Tugba
    Yavuz, Arzu Altun
    BMC PREGNANCY AND CHILDBIRTH, 2024, 24 (01)
  • [22] Predicting Diabetes Using Machine Learning Techniques
    Kirgil, Elif Nur Haner
    Erkal, Begum
    Ayyildiz, Tulin Ercelebi
    2022 INTERNATIONAL CONFERENCE ON THEORETICAL AND APPLIED COMPUTER SCIENCE AND ENGINEERING (ICTASCE), 2022, : 137 - 141
  • [23] Diabetes Classification Using Machine Learning Techniques
    Phongying, Methaporn
    Hiriote, Sasiprapa
    COMPUTATION, 2023, 11 (05)
  • [24] Diabetes Prediction using Machine Learning Techniques
    Obulesu, O.
    Suresh, K.
    Ramudu, B. Venkata
    HELIX, 2020, 10 (02): : 136 - 142
  • [25] Predicting Diabetes Mellitus With Machine Learning Techniques Using Multi-Criteria Decision Making
    Juneja, Abhinav
    Juneja, Sapna
    Kaur, Sehajpreet
    Kumar, Vivek
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2021, 11 (02) : 38 - 52
  • [26] Towards a Stacking Ensemble Model for Predicting Diabetes Mellitus using Combination of Machine Learning Techniques
    Alzubaidi, Abdulaziz A.
    Halawani, Sami M.
    Jarrah, Mutasem
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (12) : 348 - 358
  • [27] Prediction of gestational diabetes mellitus in the first 19 weeks of pregnancy using machine learning techniques
    Xiong, Yan
    Lin, Lu
    Chen, Yu
    Salerno, Stephen
    Li, Yi
    Zeng, Xiaoxi
    Li, Huafeng
    JOURNAL OF MATERNAL-FETAL & NEONATAL MEDICINE, 2022, 35 (13): : 2457 - 2463
  • [28] Prediction of Type-2 Diabetes Mellitus Disease Using Machine Learning Classifiers and Techniques
    Ahamed, B. Shamreen
    Arya, Meenakshi Sumeet
    Nancy, V. Auxilia Osvin
    FRONTIERS IN COMPUTER SCIENCE, 2022, 4
  • [29] Diagnosis of Diabetes Mellitus Using Extreme Learning Machine
    Pangaribuan, Jefri Junifer
    Suharjito
    2014 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2014, : 33 - 38
  • [30] Preemptive Diagnosis of Diabetes Mellitus Using Machine Learning
    Alassaf, Reem A.
    Alsulaim, Khawla A.
    Alroomi, Noura Y.
    Alsharif, Nouf S.
    Aljubeir, Mishael F.
    Olatunji, Sunday O.
    Alahmadi, Alaa Y.
    Imran, Mohammed
    Alzahrani, Rahma A.
    Alturayeif, Nora S.
    2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,