Machine learning based study for the classification of Type 2 diabetes mellitus subtypes

被引:2
|
作者
Ordonez-Guillen, Nelson E. [1 ]
Gonzalez-Compean, Jose Luis [1 ]
Lopez-Arevalo, Ivan [1 ]
Contreras-Murillo, Miguel [1 ]
Aldana-Bobadilla, Edwin [2 ]
机构
[1] Cinvestav Tamaulipas, Carretera Victoria Soto Marina km 5-5, Victoria 87130, Tamaulipas, Mexico
[2] CONAHCYT Ctr Invest & Estudios Avanzados IPN, Unidad Tamaulipas, Carretera Victoria Soto Marina km 5-5, Victoria 87130, Tamaulipas, Mexico
关键词
Diabetes; Diabetes subtypes; Data-driven; Classification; HOMEOSTASIS MODEL ASSESSMENT; VALIDATION; SUBGROUPS; PREDICTION; ALGORITHM; SELECTION; GLUCOSE;
D O I
10.1186/s13040-023-00340-2
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Purpose: Data-driven diabetes research has increased its interest in exploring the heterogeneity of the disease, aiming to support in the development of more specific prognoses and treatments within the so-called precision medicine. Recently, one of these studies found five diabetes subgroups with varying risks of complications and treatment responses. Here, we tackle the development and assessment of different models for classifying Type 2 Diabetes (T2DM) subtypes through machine learning approaches, with the aim of providing a performance comparison and new insights on the matter. Methods: We developed a three-stage methodology starting with the preprocessing of public databases NHANES (USA) and ENSANUT (Mexico) to construct a dataset with N = 10,077 adult diabetes patient records. We used N = 2,768 records for training/validation of models and left the remaining (N = 7,309) for testing. In the second stage, groups of observations-each one representing a T2DM subtype- were identified. We tested different clustering techniques and strategies and validated them by using internal and external clustering indices; obtaining two annotated datasets Dset A and Dset B. In the third stage, we developed different classification models assaying four algorithms, seven input-data schemes, and two validation settings on each annotated dataset. We also tested the obtained models using a majority-vote approach for classifying unseen patient records in the hold- out dataset. Results: From the independently obtained bootstrap validation for Dset A and Dset B, mean accuracies across all seven data schemes were 85.3% (+/- 9.2%) and 97.1% (+/- 3.4%), respectively. Best accuracies were 98.8% and 98.9%. Both validation setting results were consistent. For the hold-out dataset, results were consonant with most of those obtained in the literature in terms of class proportions. Conclusion: The development of machine learning systems for the classification of diabetes subtypes constitutes an important task to support physicians for fast and timely decision-making. We expect to deploy this methodology in a data analysis platform to conduct studies for identifying T2DM subtypes in patient records from hospitals.
引用
收藏
页数:37
相关论文
共 50 条
  • [31] Unsupervised machine learning based on clinical factors for the detection of coronary artery atherosclerosis in type 2 diabetes mellitus
    Yu Jiang
    Zhi-Gang Yang
    Jin Wang
    Rui Shi
    Pei-Lun Han
    Wen-Lei Qian
    Wei-Feng Yan
    Yuan Li
    Cardiovascular Diabetology, 21
  • [32] Performance analysis and prediction of type 2 diabetes mellitus based on lifestyle data using machine learning approaches
    Ganie, Shahid Mohammad
    Malik, Majid Bashir
    Arif, Tasleem
    JOURNAL OF DIABETES AND METABOLIC DISORDERS, 2022, 21 (01) : 339 - 352
  • [33] Using Machine Learning to Predict CKD upon Type 2 Diabetes Mellitus Diagnosis
    Allen, Angier O.
    Iqbal, Zohora
    Green-Saxena, Abigail
    Das, Ritankar
    JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2021, 32 (10): : 268 - 268
  • [34] Performance analysis and prediction of type 2 diabetes mellitus based on lifestyle data using machine learning approaches
    Shahid Mohammad Ganie
    Majid Bashir Malik
    Tasleem Arif
    Journal of Diabetes & Metabolic Disorders, 2022, 21 : 339 - 352
  • [35] Machine Learning Approach to Metabolomic Data Predicts Type 2 Diabetes Mellitus Incidence
    Leiherer, Andreas
    Muendlein, Axel
    Mink, Sylvia
    Mader, Arthur
    Saely, Christoph H.
    Festa, Andreas
    Fraunberger, Peter
    Drexel, Heinz
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (10)
  • [36] Exploration of the Shared Hub Genes and Biological Mechanism in Osteoporosis and Type 2 Diabetes Mellitus based on Machine Learning
    Zhao, Runhan
    Xiong, Chuang
    Zhao, Zenghui
    Zhang, Jun
    Huang, Yanran
    Xie, Zhou
    Qu, Xiao
    Luo, Xiaoji
    Li, Zefang
    BIOCHEMICAL GENETICS, 2023, 61 (06) : 2531 - 2547
  • [37] Detection of mild cognitive impairment in type 2 diabetes mellitus based on machine learning using privileged information
    Xia, Shuiwei
    Zhang, Yu
    Peng, Bo
    Hu, Xianghua
    Zhou, Limin
    Chen, Chunmiao
    Lu, Chenying
    Chen, Minjiang
    Pang, Chunying
    Dai, Yakang
    Ji, Jiansong
    NEUROSCIENCE LETTERS, 2022, 791
  • [38] Exploration of the Shared Hub Genes and Biological Mechanism in Osteoporosis and Type 2 Diabetes Mellitus based on Machine Learning
    Runhan Zhao
    Chuang Xiong
    Zenghui Zhao
    Jun Zhang
    Yanran Huang
    Zhou Xie
    Xiao Qu
    Xiaoji Luo
    Zefang Li
    Biochemical Genetics, 2023, 61 : 2531 - 2547
  • [39] A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus
    Haohui Lu
    Shahadat Uddin
    Farshid Hajati
    Mohammad Ali Moni
    Matloob Khushi
    Applied Intelligence, 2022, 52 : 2411 - 2422
  • [40] Development of Machine Learning Models for Predicting Osteoporosis in Patients with Type 2 Diabetes Mellitus-A Preliminary Study
    Wu, Xuelun
    Zhai, Furui
    Chang, Ailing
    Wei, Jing
    Guo, Yanan
    Zhang, Jincheng
    DIABETES METABOLIC SYNDROME AND OBESITY, 2023, 16 : 1987 - 2003