Stratification of diabetes in the context of comorbidities, using representation learning and topological data analysis

被引:0
|
作者
Malgorzata Wamil
Abdelaali Hassaine
Shishir Rao
Yikuan Li
Mohammad Mamouei
Dexter Canoy
Milad Nazarzadeh
Zeinab Bidel
Emma Copland
Kazem Rahimi
Gholamreza Salimi-Khorshidi
机构
[1] University of Oxford,Deep Medicine, Oxford Martin School
[2] Mayo Clinic Healthcare,Nuffield Department of Women’s and Reproductive Health, Medical Science Division
[3] University of Oxford,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Diabetes is a heterogenous, multimorbid disorder with a large variation in manifestations, trajectories, and outcomes. The aim of this study is to validate a novel machine learning method for the phenotyping of diabetes in the context of comorbidities. Data from 9967 multimorbid patients with a new diagnosis of diabetes were extracted from Clinical Practice Research Datalink. First, using BEHRT (a transformer-based deep learning architecture), the embeddings corresponding to diabetes were learned. Next, topological data analysis (TDA) was carried out to test how different areas in high-dimensional manifold correspond to different risk profiles. The following endpoints were considered when profiling risk trajectories: major adverse cardiovascular events (MACE), coronary artery disease (CAD), stroke (CVA), heart failure (HF), renal failure (RF), diabetic neuropathy, peripheral arterial disease, reduced visual acuity and all-cause mortality. Kaplan Meier curves were plotted for each derived phenotype. Finally, we tested the performance of an established risk prediction model (QRISK) by adding TDA-derived features. We identified four subgroups of patients with diabetes and divergent comorbidity patterns differing in their risk of future cardiovascular, renal, and other microvascular outcomes. Phenotype 1 (young with chronic inflammatory conditions) and phenotype 2 (young with CAD) included relatively younger patients with diabetes compared to phenotypes 3 (older with hypertension and renal disease) and 4 (older with previous CVA), and those subgroups had a higher frequency of pre-existing cardio-renal diseases. Within ten years of follow-up, 2592 patients (26%) experienced MACE, 2515 patients (25%) died, and 2020 patients (20%) suffered RF. QRISK3 model’s AUC was augmented from 67.26% (CI 67.25–67.28%) to 67.67% (CI 67.66–67.69%) by adding specific TDA-derived phenotype and the distances to both extremities of the TDA graph improving its performance in the prediction of CV outcomes. We confirmed the importance of accounting for multimorbidity when risk stratifying heterogenous cohort of patients with new diagnosis of diabetes. Our unsupervised machine learning method improved the prediction of clinical outcomes.
引用
收藏
相关论文
共 50 条
  • [21] Extraction of a Topological Representation based on Raw Data using Voronoi Diagram
    Galli, Marina
    Barber, Ramon
    Garrido, Santiago
    Moreno, Luis
    2018 INTERNATIONAL CONFERENCE ON CONTROL, ARTIFICIAL INTELLIGENCE, ROBOTICS & OPTIMIZATION (ICCAIRO), 2018, : 165 - 170
  • [22] Data induced masking representation learning for face data analysis
    Guo, Tan
    Zhang, Lei
    Tan, Xiaoheng
    Yang, Liu
    Liang, Zhifang
    KNOWLEDGE-BASED SYSTEMS, 2019, 177 : 82 - 93
  • [23] Preface of Special Issue on Data Representation and Representation Learning for Video Analysis
    Schwartz, William Robson
    Davis, Larry S.
    PATTERN RECOGNITION LETTERS, 2018, 114 : 1 - 1
  • [24] Topological data analysis for genetic-driven stratification of patients with major depressive disorder
    Maggioni, E.
    Fabbri, C.
    Vai, B.
    Turtulici, N.
    Kasper, S.
    Zohar, J.
    Souery, D.
    Montgomery, S.
    Albani, D.
    Forloni, G.
    Ferentinos, P.
    Rujescu, D.
    Mendlewicz, J.
    Benedetti, F.
    Serretti, A.
    Brambilla, P.
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2020, 40 : S86 - S87
  • [25] Thermochemical Data Fusion Using Graph Representation Learning
    Bhattacharjee, Himaghna
    Vlachos, Dionisios G.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (10) : 4673 - 4683
  • [26] A new representation learning approach for credit data analysis
    Li, Tie
    Kou, Gang
    Peng, Yi
    INFORMATION SCIENCES, 2023, 627 : 115 - 131
  • [27] Learning context-free grammar using improved tabular representation
    Unold, Olgierd
    Jaworski, Marcin
    APPLIED SOFT COMPUTING, 2010, 10 (01) : 44 - 52
  • [28] Mining Social Media Data Using Topological Data Analysis
    Almgren, Khaled
    Kim, Minkyu
    Lee, Jeongkyu
    2017 IEEE 18TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI 2017), 2017, : 144 - 153
  • [29] Topological data analysis assisted machine learning for polar topological structures in oxide superlattices
    Du, Guanshihan
    Zhou, Linming
    Huang, Yuhui
    Wu, Yongjun
    Hong, Zijian
    ACTA MATERIALIA, 2025, 282
  • [30] A systematic analysis of learning analytics using multi-source data in the context of Spain
    Munoz-Merino, Pedro J.
    Moreno-Marcos, Pedro Manuel
    Rubio-Fernandez, Aaron
    Tsai, Yi-Shan
    Gasevic, Dragan
    Kloos, Carlos Delgado
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2023, 42 (05) : 643 - 657