Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

被引:91
|
作者
Beaulieu-Jones, Brett K. [1 ,2 ]
Lavage, Daniel R. [3 ]
Snyder, John W. [3 ]
Moore, Jason H. [2 ]
Pendergrass, Sarah A. [3 ]
Bauer, Christopher R. [3 ]
机构
[1] Univ Penn, Perelman Sch Med, Genom & Comp Biol Grad Grp, Philadelphia, PA 19104 USA
[2] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[3] Geisinger, Biomed & Translat Informat Inst, 100 N Acad Ave, Danville, PA 17822 USA
基金
美国国家卫生研究院;
关键词
imputation; missing data; clinical laboratory test results; electronic health records; MULTIPLE IMPUTATION; SENSITIVITY-ANALYSIS;
D O I
10.2196/medinform.8960
中图分类号
R-058 [];
学科分类号
摘要
Background: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results. Objective: The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. Methods: We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling). Results: Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation. Conclusions: The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Challenges and opportunities beyond structured data in analysis of electronic health records
    Tayefi, Maryam
    Ngo, Phuong
    Chomutare, Taridzo
    Dalianis, Hercules
    Salvi, Elisa
    Budrionis, Andrius
    Godtliebsen, Fred
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2021, 13 (06)
  • [2] Family Relatives as Structured Data in Electronic Health Records
    Zhou, L.
    Lu, Y.
    Vitale, C. J.
    Mar, P. L.
    Chang, F.
    Dhopeshwarkar, N.
    Rocha, R. A.
    APPLIED CLINICAL INFORMATICS, 2014, 5 (02): : 349 - 367
  • [3] Multiple Imputation of Missing Data in Longitudinal Electronic Health Records
    Petersen, Irene
    Welch, Catherine
    Bartlett, Jonathan
    Morris, Richard
    Walters, Kate
    Nazareth, Irwin
    Marston, Louise
    White, Ian
    Carpenter, James
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2013, 22 : 302 - 302
  • [4] A multi-step approach to managing missing data in time and patient variant electronic health records
    Nina Cesare
    Lawrence P. O. Were
    BMC Research Notes, 15
  • [5] A multi-step approach to managing missing data in time and patient variant electronic health records
    Cesare, Nina
    Were, Lawrence P. O.
    BMC RESEARCH NOTES, 2022, 15 (01)
  • [6] How to deal with missing data in interrupted time series analysis with electronic health records
    Bazo-Alvarez, Juan Carlos
    Morris, Tim P.
    Pham, Tra My
    Carpenter, James R.
    Petersen, Irene
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 410 - 410
  • [7] Mining for equitable health: Assessing the impact of missing data in electronic health records
    Getzen, Emily
    Ungar, Lyle
    Mowery, Danielle
    Jiang, Xiaoqian
    Long, Qi
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 139
  • [8] A novel method for handling Missing Not at Random Data in the electronic health records
    Shen, Xinpeng
    Ma, Sisi
    Caraballo, Pedro J.
    Vemuri, Prashanthi
    Simon, Gyorgy J.
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 21 - 26
  • [9] Imputation of Missing Data in Electronic Health Records Based on Patients' Similarities
    Jazayeri, Ali
    Liang, Ou Stella
    Yang, Christopher C.
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2020, 4 (03) : 295 - 307
  • [10] Imputation of Missing Data in Electronic Health Records Based on Patients’ Similarities
    Ali Jazayeri
    Ou Stella Liang
    Christopher C. Yang
    Journal of Healthcare Informatics Research, 2020, 4 : 295 - 307