3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data

被引:73
|
作者
Luo, Yuan [1 ]
Szolovits, Peter [2 ]
Dighe, Anand S. [3 ,4 ]
Baron, Jason M. [3 ,4 ]
机构
[1] Northwestern Univ, Dept Prevent Med, Chicago, IL 60611 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] Massachusetts Gen Hosp, Dept Pathol, Boston, MA 02114 USA
[4] Harvard Med Sch, Boston, MA USA
关键词
machine learning; imputation; missing data; electronic health record; EHR; multiple imputation with chained equations; Gaussian process; computational pathology; data mining; MISSING VALUE IMPUTATION; MULTIPLE IMPUTATION; SURVIVAL ANALYSIS; VALUES;
D O I
10.1093/jamia/ocx133
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. Methods: We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. Results: 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. Conclusions: 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.
引用
收藏
页码:645 / 653
页数:9
相关论文
共 50 条
  • [31] Analyzing longitudinal qualitative data: The application of trajectory and recurrent cross-sectional approaches
    Grossoehme D.
    Lipstein E.
    BMC Research Notes, 9 (1)
  • [32] STATISTICAL VECTOR FIELD ANALYSIS APPLIED TO MIXED CROSS-SECTIONAL AND LONGITUDINAL DATA
    BOKER, SM
    MCARDLE, JJ
    EXPERIMENTAL AGING RESEARCH, 1995, 21 (01) : 77 - 93
  • [33] The age of peak performance in Ironman triathlon: a cross-sectional and longitudinal data analysis
    Stiefel, Michael
    Knechtle, Beat
    Ruest, Christoph Alexander
    Rosemann, Thomas
    Lepers, Romuald
    EXTREME PHYSIOLOGY & MEDICINE, 2013, 2
  • [34] Joint spatial Bayesian modeling for studies combining longitudinal and cross-sectional data
    Lawson, Andrew B.
    Carroll, Rachel
    Castro, Marcia
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2014, 23 (06) : 611 - 624
  • [35] Cross-sectional and longitudinal neuroanatomical profiles of distinct clinical (adaptive) outcomes in autism
    Charlotte M. Pretzsch
    Dorothea L. Floris
    Tim Schäfer
    Anke Bletsch
    Caroline Gurr
    Michael V. Lombardo
    Chris H. Chatham
    Julian Tillmann
    Tony Charman
    Martina Arenella
    Emily Jones
    Sara Ambrosino
    Thomas Bourgeron
    Guillaume Dumas
    Freddy Cliquet
    Claire S. Leblond
    Eva Loth
    Bethany Oakley
    Jan K. Buitelaar
    Simon Baron-Cohen
    Christian F. Beckmann
    Antonio M. Persico
    Tobias Banaschewski
    Sarah Durston
    Christine M. Freitag
    Declan G. M. Murphy
    Christine Ecker
    Molecular Psychiatry, 2023, 28 : 2158 - 2169
  • [36] Serum ionic dysequilibria in clinical opioid dependence: Cross-sectional and longitudinal studies
    Reece, A. S.
    Norman, A.
    Hulse, G. K.
    HUMAN & EXPERIMENTAL TOXICOLOGY, 2017, 36 (08) : 776 - 784
  • [37] Measuring geographic mobility: Comparison of estimates from longitudinal and cross-sectional data
    Watson, Nicole
    SURVEY RESEARCH METHODS, 2020, 14 (01): : 1 - 18
  • [38] Cross-sectional and longitudinal neuroanatomical profiles of distinct clinical (adaptive) outcomes in autism
    Pretzsch, Charlotte
    Floris, Dorothea
    Schaefer, Tim V.
    Bletsch, Anke H.
    Gurr, Caroline
    Lombardo, Michael
    Chatham, Chris
    Tillmann, Julian
    Charman, Tony
    Arenella, Martina
    Jones, Emily
    Ambrosino, Sara
    Bourgeron, Thomas S.
    Dumas, Guillaume
    Cliquet, Freddy
    Leblond, Claire K.
    Loth, Eva
    Oakley, Bethany F.
    Buitelaar, Jan M.
    Baron-Cohen, Simon
    Beckmann, Christian
    Persico, Antonio M.
    Banaschewski, Tobias
    Durston, Sarah
    Freitag, Christine
    Murphy, Declan G. M.
    Ecker, Christine
    MOLECULAR PSYCHIATRY, 2023, 28 (05) : 2158 - 2169
  • [39] Cross-sectional and longitudinal growth patterns in osteogenesis imperfecta: implications for clinical care
    Emily L. Germain-Lee
    Feng-Shu Brennen
    Diana Stern
    Aditi Kantipuly
    Pamela Melvin
    Mia S. Terkowitz
    Jay R. Shapiro
    Pediatric Research, 2016, 79 : 489 - 495
  • [40] Cross-sectional and longitudinal growth patterns in osteogenesis imperfecta: implications for clinical care
    Germain-Lee, Emily L.
    Brennen, Feng-Shu
    Stern, Diana
    Kantipuly, Aditi
    Melvin, Pamela
    Terkowitz, Mia S.
    Shapiro, Jay R.
    PEDIATRIC RESEARCH, 2016, 79 (03) : 489 - 495