Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

被引:15
|
作者
Li, Cheng [1 ]
Rana, Santu [1 ]
Dinh Phung [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia
关键词
Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction; TEXT CATEGORIZATION MODELS; READMISSION; RISK;
D O I
10.1016/j.knosys.2016.02.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:168 / 182
页数:15
相关论文
共 50 条
  • [21] Electronic farming records ? A framework for normalising agronomic knowledge discovery
    Ngo, Vuong M.
    Kechadi, M-Tahar
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2021, 184
  • [22] Knowledge Management for the Protection of Information in Electronic Medical Records
    Lea, Nathan
    Hailes, Stephen
    Austin, Tony
    Kalra, Dipak
    EHEALTH BEYOND THE HORIZON - GET IT THERE, 2008, 136 : 685 - +
  • [23] CATVI: Conditional and Adaptively Truncated Variational Inference for Hierarchical Bayesian Nonparametric Models
    Liu, Yirui
    Qiao, Xinghao
    Lam, Jessica
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [24] Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models
    Wang, Xiaogang
    Ma, Keng Teck
    Ng, Gee-Wah
    Grimson, W. Eric L.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2011, 95 (03) : 287 - 312
  • [25] Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models
    Xiaogang Wang
    Keng Teck Ma
    Gee-Wah Ng
    W. Eric L. Grimson
    International Journal of Computer Vision, 2011, 95 : 287 - 312
  • [26] Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction
    Liu, Feifei
    Liu, Mingtong
    Li, Meiting
    Xin, Yuwei
    Gao, Dongping
    Wu, Jun
    Zhu, Jiaan
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2023, 13 (06) : 3873 - +
  • [27] Knowledge Retrieval from PubMed Abstracts and Electronic Medical Records with the Multiple Sclerosis Ontology
    Malhotra, Ashutosh
    Guendel, Michaela
    Rajput, Abdul Mateen
    Mevissen, Heinz-Theodor
    Saiz, Albert
    Pastor, Xavier
    Lozano-Rubi, Raimundo
    Martinez-Lapsicina, Elena H.
    Zubizarreta, Irati
    Mueller, Bernd
    Kotelnikova, Ekaterina
    Toldo, Luca
    Hofmann-Apitius, Martin
    Villoslada, Pablo
    PLOS ONE, 2015, 10 (02):
  • [28] Medical Knowledge Extraction from Graph-Based Modeling of Electronic Health Records
    Kallipolitis, Athanasios
    Gallos, Parisis
    Menychtas, Andreas
    Tsanakas, Panayiotis
    Maglogiannis, Ilias
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT I, 2023, 675 : 279 - 290
  • [29] Risk Prediction on Electronic Health Records with Prior Medical Knowledge
    Ma, Fenglong
    Gao, Jing
    Suo, Qiuling
    You, Quanzeng
    Zhou, Jing
    Zhang, Aidong
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1910 - 1919
  • [30] Demographic Aware Probabilistic Medical Knowledge Graph Embeddings of Electronic Medical Records
    Guluzade, Aynur
    Kacupaj, Endri
    Maleshkova, Maria
    ARTIFICIAL INTELLIGENCE IN MEDICINE (AIME 2021), 2021, : 408 - 417