Stratification of diabetes in the context of comorbidities, using representation learning and topological data analysis

被引:0
|
作者
Malgorzata Wamil
Abdelaali Hassaine
Shishir Rao
Yikuan Li
Mohammad Mamouei
Dexter Canoy
Milad Nazarzadeh
Zeinab Bidel
Emma Copland
Kazem Rahimi
Gholamreza Salimi-Khorshidi
机构
[1] University of Oxford,Deep Medicine, Oxford Martin School
[2] Mayo Clinic Healthcare,Nuffield Department of Women’s and Reproductive Health, Medical Science Division
[3] University of Oxford,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Diabetes is a heterogenous, multimorbid disorder with a large variation in manifestations, trajectories, and outcomes. The aim of this study is to validate a novel machine learning method for the phenotyping of diabetes in the context of comorbidities. Data from 9967 multimorbid patients with a new diagnosis of diabetes were extracted from Clinical Practice Research Datalink. First, using BEHRT (a transformer-based deep learning architecture), the embeddings corresponding to diabetes were learned. Next, topological data analysis (TDA) was carried out to test how different areas in high-dimensional manifold correspond to different risk profiles. The following endpoints were considered when profiling risk trajectories: major adverse cardiovascular events (MACE), coronary artery disease (CAD), stroke (CVA), heart failure (HF), renal failure (RF), diabetic neuropathy, peripheral arterial disease, reduced visual acuity and all-cause mortality. Kaplan Meier curves were plotted for each derived phenotype. Finally, we tested the performance of an established risk prediction model (QRISK) by adding TDA-derived features. We identified four subgroups of patients with diabetes and divergent comorbidity patterns differing in their risk of future cardiovascular, renal, and other microvascular outcomes. Phenotype 1 (young with chronic inflammatory conditions) and phenotype 2 (young with CAD) included relatively younger patients with diabetes compared to phenotypes 3 (older with hypertension and renal disease) and 4 (older with previous CVA), and those subgroups had a higher frequency of pre-existing cardio-renal diseases. Within ten years of follow-up, 2592 patients (26%) experienced MACE, 2515 patients (25%) died, and 2020 patients (20%) suffered RF. QRISK3 model’s AUC was augmented from 67.26% (CI 67.25–67.28%) to 67.67% (CI 67.66–67.69%) by adding specific TDA-derived phenotype and the distances to both extremities of the TDA graph improving its performance in the prediction of CV outcomes. We confirmed the importance of accounting for multimorbidity when risk stratifying heterogenous cohort of patients with new diagnosis of diabetes. Our unsupervised machine learning method improved the prediction of clinical outcomes.
引用
收藏
相关论文
共 50 条
  • [41] Generating an Agent Taxonomy using Topological Data Analysis
    Swarup, Samarth
    Rezazadegan, Reza
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2204 - 2205
  • [42] Movie Genre Detection Using Topological Data Analysis
    Doshi, Pratik
    Zadrozny, Wlodek
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 117 - 128
  • [43] Using Topological Data Analysis to Visualize Instrument Output
    Chukanov S.N.
    Chukanov I.S.
    Scientific Visualization, 2023, 15 (02): : 11 - 21
  • [44] Lean blowout detection using topological data analysis
    Bhattacharya, Arijit
    Mondal, Sabyasachi
    De, Somnath
    Mukhopadhyay, Achintya
    Sen, Swarnendu
    CHAOS, 2024, 34 (01)
  • [45] Statistical Topological Data Analysis using Persistence Landscapes
    Bubenik, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 77 - 102
  • [46] Exploring geographic hotspots using topological data analysis
    Zhang, Rui
    Lukasczyk, Jonas
    Wang, Feng
    Ebert, David
    Shakarian, Paulo
    Mack, Elizabeth A.
    Maciejewski, Ross
    TRANSACTIONS IN GIS, 2021, 25 (06) : 3188 - 3209
  • [47] On the Topological Analysis of Industrial Process Data Using the SOM
    Corona, Francesco
    Mulas, Michela
    Baratti, Roberto
    Romagnoli, Jose
    10TH INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING, 2009, 27 : 1173 - 1178
  • [48] FWNetAE: Spatial Representation Learning for Full Waveform Data Using Deep Learning
    Shinohara, Takayuki
    Xiu, Haoyi
    Matsuoka, Masashi
    2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 259 - 266
  • [49] Using fuzzy representation in educational data mining and learning analytics
    Ma, Jun
    Yang, Jie
    Howard, Sarah K.
    Gonzalez, Carlos
    Lopez, Dany
    DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 553 - 559
  • [50] Self-supervised Representation Learning Using 360° Data
    Li, Junnan
    Liu, Jianquan
    Wong, Yongkang
    Nishimura, Shoji
    Kankanhalli, Mohan S.
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 998 - 1006