Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

被引:15
|
作者
Li, Cheng [1 ]
Rana, Santu [1 ]
Dinh Phung [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia
关键词
Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction; TEXT CATEGORIZATION MODELS; READMISSION; RISK;
D O I
10.1016/j.knosys.2016.02.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:168 / 182
页数:15
相关论文
共 50 条
  • [1] MKDS: A Medical Knowledge Discovery System Learned from Electronic Medical Records (Demonstration)
    Huang, Hen-Hsen
    Yen, An-Zi
    Chen, Hsin-Hsi
    INFORMATION RETRIEVAL TECHNOLOGY (AIRS 2018), 2018, 11292 : 196 - 202
  • [2] Nonparametric Bayesian methods in hierarchical models
    Escobar, M. D.
    Parfumerie und Kosmetik, 7646
  • [3] Nonparametric Bayesian methods in hierarchical models
    Escobar, M. D.
    Journal of Statistical Planning and Inference, 43 (1-2):
  • [4] NONPARAMETRIC BAYESIAN METHODS IN HIERARCHICAL-MODELS
    ESCOBAR, MD
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1995, 43 (1-2) : 97 - 106
  • [5] Interpretable Predictive Models for Knowledge Discovery from Home-Care Electronic Health Records
    Westra, Bonnie L.
    Dey, Sanjoy
    Fang, Gang
    Steinbach, Michael
    Kumar, Vipin
    Oancea, Cristina
    Savik, Kay
    Dierich, Mary
    JOURNAL OF HEALTHCARE ENGINEERING, 2011, 2 (01) : 55 - 74
  • [6] Knowledge Discovery from Healthcare Electronic Records for Sustainable Environment
    Mahoto, Naeem Ahmed
    Shaikh, Asadullah
    Al Reshan, Mana Saleh
    Memon, Muhammad Ali
    Sulaiman, Adel
    SUSTAINABILITY, 2021, 13 (16)
  • [7] Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing
    Walker, SG
    Mallick, BK
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1997, 59 (04): : 845 - 860
  • [8] Knowledge discovery from within An examination of records management and electronic records management syllabi
    Force, Donald C.
    Zhang, Jane
    RECORDS MANAGEMENT JOURNAL, 2016, 26 (03) : 259 - 278
  • [9] Learning a Health Knowledge Graph from Electronic Medical Records
    Rotmensch, Maya
    Halpern, Yoni
    Tlimat, Abdulhakim
    Horng, Steven
    Sontag, David
    SCIENTIFIC REPORTS, 2017, 7
  • [10] Learning a Health Knowledge Graph from Electronic Medical Records
    Maya Rotmensch
    Yoni Halpern
    Abdulhakim Tlimat
    Steven Horng
    David Sontag
    Scientific Reports, 7