Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

被引:15
|
作者
Li, Cheng [1 ]
Rana, Santu [1 ]
Dinh Phung [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia
关键词
Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction; TEXT CATEGORIZATION MODELS; READMISSION; RISK;
D O I
10.1016/j.knosys.2016.02.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:168 / 182
页数:15
相关论文
共 50 条
  • [41] Converting data into information and knowledge: The promise and the reality of electronic medical records
    Poterack, Karl A.
    Ramakrishna, Harish
    ANNALS OF CARDIAC ANAESTHESIA, 2015, 18 (03) : 290 - 292
  • [42] Injecting Domain Knowledge in Electronic Medical Records to Improve Hospitalization Prediction
    Gazzotti, Raphael
    Faron-Zucker, Catherine
    Gandon, Fabien
    Lacroix-Hugues, Virginie
    Darmon, David
    SEMANTIC WEB, ESWC 2019, 2019, 11503 : 116 - 130
  • [43] Electronic medical records, genetics, and childhood obesity: A new direction for scientific discovery?
    Faith, Myles S.
    JOURNAL OF PEDIATRIC GENETICS, 2012, 1 (02) : 69 - 70
  • [44] A new frontier in drug discovery for skin cancer through electronic medical records
    Dousset, Lea
    Khosrotehrani, Kiarash
    BRITISH JOURNAL OF DERMATOLOGY, 2025, 192 (04) : 570 - 571
  • [45] Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics
    Kidd, Brian
    Wang, Kunbo
    Xu, Yanxun
    Ni, Yang
    BIOCOMPUTING 2023, PSB 2023, 2023, : 484 - 495
  • [46] Towards a Multi-agent System for Medical Records Processing and Knowledge Discovery
    Ivascu, Todor
    Dinis, Adriana
    Negru, Viorel
    PROCEEDINGS OF 2016 18TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 395 - 399
  • [47] Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records
    Goodwin, Travis
    Harabagiu, Sanda M.
    2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 363 - 370
  • [48] Rough - Granular Computing knowledge discovery models for medical classification
    Eissa, Mohammed M.
    Elmogy, Mohammed
    Hashem, Mohammed
    EGYPTIAN INFORMATICS JOURNAL, 2016, 17 (03) : 265 - 272
  • [49] Validating pathophysiological models of aging using clinical electronic medical records
    Chen, David P.
    Morgan, Alexander A.
    Butte, Atul J.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (03) : 358 - 364
  • [50] Exploratory Analysis of HIV Status Knowledge and Associated Factors Using Data from Electronic Medical Records
    Burdisso, Natividad
    Esteban, Santiago
    Kopitowski, Karin S.
    Terrasa, Sergio A.
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 838 - 842