Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

被引:91
|
作者
Beaulieu-Jones, Brett K. [1 ,2 ]
Lavage, Daniel R. [3 ]
Snyder, John W. [3 ]
Moore, Jason H. [2 ]
Pendergrass, Sarah A. [3 ]
Bauer, Christopher R. [3 ]
机构
[1] Univ Penn, Perelman Sch Med, Genom & Comp Biol Grad Grp, Philadelphia, PA 19104 USA
[2] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[3] Geisinger, Biomed & Translat Informat Inst, 100 N Acad Ave, Danville, PA 17822 USA
基金
美国国家卫生研究院;
关键词
imputation; missing data; clinical laboratory test results; electronic health records; MULTIPLE IMPUTATION; SENSITIVITY-ANALYSIS;
D O I
10.2196/medinform.8960
中图分类号
R-058 [];
学科分类号
摘要
Background: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results. Objective: The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. Methods: We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling). Results: Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation. Conclusions: The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Measuring Overuse With Electronic Health Records Data
    Isaac, Thomas
    Rosenthal, Meredith B.
    Colla, Carrie H.
    Morden, Nancy E.
    Mainor, Alexander J.
    Li, Zhonghe
    Nguyen, Kevin H.
    Kinsella, Elizabeth A.
    Sequist, Thomas D.
    AMERICAN JOURNAL OF MANAGED CARE, 2018, 24 (01): : 19 - +
  • [32] Equity and bias in electronic health records data
    Boyd, Andrew D.
    Gonzalez-Guarda, Rosa
    Lawrence, Katharine
    Patil, Crystal L.
    Ezenwa, Miriam O.
    O'Brien, Emily C.
    Paek, Hyung
    Braciszewski, Jordan M.
    Adeyemi, Oluwaseun
    Cuthel, Allison M.
    Darby, Juanita E.
    Zigler, Christina K.
    Ho, P. Michael
    Faurot, Keturah R.
    Staman, Karen
    Leigh, Jonathan W.
    Dailey, Dana L.
    Cheville, Andrea
    Del Fiol, Guilherme
    Knisely, Mitchell R.
    Marsolo, Keith
    Richesson, Rachel L.
    Schlaeger, Judith M.
    CONTEMPORARY CLINICAL TRIALS, 2023, 130
  • [33] Immunization Data Exchange With Electronic Health Records
    Stockwell, Melissa S.
    Natarajan, Karthik
    Ramakrishnan, Rajasekhar
    Holleran, Stephen
    Forney, Kristen
    Aponte, Angel
    Vawdrey, David K.
    PEDIATRICS, 2016, 137 (06)
  • [34] Data and Information in the Sea of Electronic Health Records
    Shah, Rashmee U.
    Matheny, Michael E.
    CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES, 2018, 11 (12):
  • [35] Multimodal Data Analysis and Visualization to Study the Usage of Electronic Health Records
    Weibel, Nadir
    Ashfaq, Shazia
    Calvitti, Alan
    Hollan, James D.
    Agha, Zia
    PROCEEDINGS OF THE 2013 7TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING TECHNOLOGIES FOR HEALTHCARE AND WORKSHOPS (PERVASIVEHEALTH 2013), 2013, : 282 - 283
  • [36] Health indicator recording in UK primary care electronic health records: key implications for handling missing data
    Petersen, Irene
    Welch, Catherine A.
    Nazareth, Irwin
    Walters, Kate
    Marston, Louise
    Morris, Richard W.
    Carpenter, James R.
    Morris, Tim P.
    Tra My Pham
    CLINICAL EPIDEMIOLOGY, 2019, 11 : 157 - 167
  • [37] Health indicator recording in UK primary care electronic health records: Key implications for handling missing data
    Petersen, Irene
    Welch, Catherine A.
    Nazareth, Irwin
    Walters, Kate
    Marston, Louise
    Morris, Richard W.
    Carpenter, James R.
    Morris, Tim P.
    Tra My Pham
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2019, 28 : 113 - 114
  • [38] Federated and distributed learning applications for electronic health records and structured medical data: a scoping review
    Li, Siqi
    Liu, Pinyan
    Nascimento, Gustavo G.
    Wang, Xinru
    Leite, Fabio Renato Manzolli
    Chakraborty, Bibhas
    Hong, Chuan
    Ning, Yilin
    Xie, Feng
    Teo, Zhen Ling
    Ting, Daniel Shu Wei
    Haddadi, Hamed
    Ong, Marcus Eng Hock
    Peres, Marco Aurelio
    Liu, Nan
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (12) : 2041 - 2049
  • [39] Assessing the Availability of Data on Social and Behavioral Determinants in Structured and Unstructured Electronic Health Records: A Retrospective Analysis of a Multilevel Health Care System
    Hatef, Elham
    Rouhizadeh, Masoud
    Tia, Iddrisu
    Lasser, Elyse
    Hill-Briggs, Felicia
    Marsteller, Jill
    Kharrazi, Hadi
    JMIR MEDICAL INFORMATICS, 2019, 7 (03)
  • [40] Experimental Analysis of Structured Covariance Estimators with Missing data
    Rosamilia, Massimo
    Aubry, Augusto
    Carotenuto, Vincenzo
    De Maio, Antonio
    2021 IEEE 8TH INTERNATIONAL WORKSHOP ON METROLOGY FOR AEROSPACE (IEEE METROAEROSPACE), 2021, : 271 - 276