Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

被引:15
|
作者
Li, Cheng [1 ]
Rana, Santu [1 ]
Dinh Phung [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia
关键词
Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction; TEXT CATEGORIZATION MODELS; READMISSION; RISK;
D O I
10.1016/j.knosys.2016.02.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:168 / 182
页数:15
相关论文
共 50 条
  • [31] iGAS: A framework for using electronic intraoperative medical records for genomic discovery
    Levin, Matthew A.
    Joseph, Thomas T.
    Jeff, Janina M.
    Nadukuru, Rajiv
    Ellis, Stephen B.
    Bottinger, Erwin P.
    Kenny, Eimear E.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 67 : 80 - 89
  • [32] Phenotyping Down syndrome: discovery and predictive modelling with electronic medical records
    Nguyen, T. Q.
    Kerley, C. I.
    Key, A. P.
    Maxwell-Horn, A. C.
    Wells, Q. S.
    Neul, J. L.
    Cutting, L. E.
    Landman, B. A.
    JOURNAL OF INTELLECTUAL DISABILITY RESEARCH, 2024, 68 (05) : 491 - 511
  • [33] Electronic medical records - where to from here?
    Pearce, Christopher
    AUSTRALIAN FAMILY PHYSICIAN, 2009, 38 (07) : 537 - 540
  • [34] Evaluation of Five Sentence Similarity Models on Electronic Medical Records
    Chen, Qingyu
    Du, Jingcheng
    Kim, Sun
    Wilbur, W. John
    Lu, Zhiyong
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 533 - 533
  • [35] From Documents on Paper to Electronic Medical Records
    Carrajo, Lino
    Penas, Angel
    Melcon, Ruben
    Javier Gonzalez, Fco
    Couto, Eduardo
    EHEALTH BEYOND THE HORIZON - GET IT THERE, 2008, 136 : 395 - 400
  • [36] Nonparametric learning from Bayesian models with randomized objective functions
    Lyddon, Simon
    Walker, Stephen
    Holmes, Chris
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [37] A data-driven medical knowledge discovery framework to predict the length of ICU stay for patients undergoing craniotomy based on electronic medical records
    Wang, Shaobo
    Li, Jun
    Wang, Qiqi
    Jiao, Zengtao
    Yan, Jun
    Liu, Youjun
    Yu, Rongguo
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (01) : 837 - 858
  • [38] Hierarchical viewpoint discovery from tweets using Bayesian modelling
    Zhu, Lixing
    He, Yulan
    Zhou, Deyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 116 : 430 - 438
  • [39] Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models
    Li, Cheng
    Rana, Santu
    Phung, Dinh
    Venkatesh, Svetha
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 1307 - 1312
  • [40] Unsupervised Grouped Axial Data Modeling via Hierarchical Bayesian Nonparametric Models With Watson Distributions
    Fan, Wentao
    Yang, Lin
    Bouguila, Nizar
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9654 - 9668