Deep representation learning for clustering longitudinal survival data from electronic health records

被引:0
|
作者
Qiu, Jiajun [1 ]
Hu, Yao [1 ]
Li, Li [1 ]
Erzurumluoglu, Abdullah Mesut [1 ]
Braenne, Ingrid [1 ]
Whitehurst, Charles [2 ]
Schmitz, Jochen [2 ]
Arora, Jatin [1 ]
Bartholdy, Boris Alexander [1 ]
Gandhi, Shrey [1 ]
Khoueiry, Pierre [1 ]
Mueller, Stefanie [1 ]
Noyvert, Boris [1 ]
Ding, Zhihao [1 ]
Jensen, Jan Nygaard [1 ]
de Jong, Johann [1 ]
机构
[1] Boehringer Ingelheim Pharm GmbH Co KG, Global Computat Biol & Digital Sci, Biberach, Germany
[2] Boehringer Ingelheim GmbH & Co KG, Immunol & Resp Dis, Ridgefield, CT USA
关键词
GENOME-WIDE ASSOCIATION; LIKELIHOOD; VARIANTS; MEDICINE;
D O I
10.1038/s41467-025-56625-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn's disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn's disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records
    Miotto, Riccardo
    Li, Li
    Kidd, Brian A.
    Dudley, Joel T.
    SCIENTIFIC REPORTS, 2016, 6
  • [22] Scalable and accurate deep learning with electronic health records
    Alvin Rajkomar
    Eyal Oren
    Kai Chen
    Andrew M. Dai
    Nissan Hajaj
    Michaela Hardt
    Peter J. Liu
    Xiaobing Liu
    Jake Marcus
    Mimi Sun
    Patrik Sundberg
    Hector Yee
    Kun Zhang
    Yi Zhang
    Gerardo Flores
    Gavin E. Duggan
    Jamie Irvine
    Quoc Le
    Kurt Litsch
    Alexander Mossin
    Justin Tansuwan
    De Wang
    James Wexler
    Jimbo Wilson
    Dana Ludwig
    Samuel L. Volchenboum
    Katherine Chou
    Michael Pearson
    Srinivasan Madabushi
    Nigam H. Shah
    Atul J. Butte
    Michael D. Howell
    Claire Cui
    Greg S. Corrado
    Jeffrey Dean
    npj Digital Medicine, 1
  • [23] Scalable and accurate deep learning with electronic health records
    Rajkomar, Alvin
    Oren, Eyal
    Chen, Kai
    Dai, Andrew M.
    Hajaj, Nissan
    Hardt, Michaela
    Liu, Peter J.
    Liu, Xiaobing
    Marcus, Jake
    Sun, Mimi
    Sundberg, Patrik
    Yee, Hector
    Zhang, Kun
    Zhang, Yi
    Flores, Gerardo
    Duggan, Gavin E.
    Irvine, Jamie
    Quoc Le
    Litsch, Kurt
    Mossin, Alexander
    Tansuwan, Justin
    Wang, De
    Wexler, James
    Wilson, Jimbo
    Ludwig, Dana
    Volchenboum, Samuel L.
    Chou, Katherine
    Pearson, Michael
    Madabushi, Srinivasan
    Shah, Nigam H.
    Butte, Atul J.
    Howell, Michael D.
    Cui, Claire
    Corrado, Greg S.
    Dean, Jeffrey
    NPJ DIGITAL MEDICINE, 2018, 1
  • [24] Research progress on electronic health records multimodal data fusion based on deep learning
    Fan, Yong
    Zhang, Zhengbo
    Wang, Jing
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2024, 41 (05): : 1062 - 1071
  • [25] A Deep Learning Approach to Predict Neonatal Encephalopathy from Electronic Health Records
    Gao, Cheng
    Yan, Chao
    Osmundson, Sarah
    Malin, Bradley A.
    Chen, You
    2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 170 - 176
  • [26] Deep learning predicts extreme preterm birth from electronic health records
    Gao, Cheng
    Osmundson, Sarah
    Edwards, Digna R. Velez
    Jackson, Gretchen Purcell
    Malin, Bradley A.
    Chen, You
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 100
  • [27] Analysis and Representation of Illocutions from Electronic Health Records
    dos Reis, Julio Cesar
    Bonacin, Rodrigo
    Perciani, Edemar Mendes
    Calani Baranauskas, Maria Cecilia
    SOCIALLY AWARE ORGANISATIONS AND TECHNOLOGIES: IMPACT AND CHALLENGES, 2016, 477 : 209 - 218
  • [28] Multiple Imputation of Missing Data in Longitudinal Electronic Health Records
    Petersen, Irene
    Welch, Catherine
    Bartlett, Jonathan
    Morris, Richard
    Walters, Kate
    Nazareth, Irwin
    Marston, Louise
    White, Ian
    Carpenter, James
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2013, 22 : 302 - 302
  • [29] LAVA: Longitudinal Adversarial Attack on Electronic Health Records Data
    An, Sungtae
    Xiao, Cao
    Stewart, Walter F.
    Sun, Jimeng
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2558 - 2564
  • [30] Longitudinal deep learning clustering of Type 2 Diabetes Mellitus trajectories using routinely collected health records
    Manzini, Enrico
    Vlacho, Bogdan
    Franch-Nadal, Josep
    Escudero, Joan
    Genova, Ana
    Reixach, Elisenda
    Andres, Erik
    Pizarro, Israel
    Portero, Jose-Luis
    Mauricio, Didac
    Perera-Lluna, Alexandre
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 135