EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records

被引:0
|
作者
Luo, Jiawei [1 ,2 ,3 ]
Huang, Shixin [4 ,5 ]
Lan, Lan [6 ]
Yang, Shu [7 ]
Cao, Tingqian [8 ]
Yin, Jin [1 ,2 ,3 ]
Qiu, Jiajun [1 ,2 ,3 ]
Yang, Xiaoyan [1 ,2 ,3 ]
Guo, Yingqiang [1 ]
Zhou, Xiaobo [9 ]
机构
[1] Sichuan Univ, West China Hosp, West China Sch Med, Dept Cardiovasc Surg, Chengdu 610041, Sichuan, Peoples R China
[2] Sichuan Univ, West China Hosp, West China Biomed Big Data Ctr, West China Sch Med, Chengdu 610041, Sichuan, Peoples R China
[3] Sichuan Univ, Medx Ctr Informat, Chengdu 610041, Peoples R China
[4] Peoples Hosp Yubei Dist Chongqing, Dept Sci Res, Chongqing 401120, Peoples R China
[5] Chongqing Univ Posts & Telecommun, Sch Commun & Informat Engn, Chongqing 400065, Peoples R China
[6] Capital Med Univ, Beijing Tiantan Hosp, IT Ctr, Beijing 100070, Peoples R China
[7] Chengdu Univ Tradit Chinese Med, Coll Med Informat Engn, Chengdu 610075, Peoples R China
[8] Sichuan Univ, West China Hosp, Integrated Care Management Ctr, Chengdu 610041, Peoples R China
[9] Univ Texas, Ctr Computat Syst Med, McWilliams Sch Biomed Informat, Hlth Sci Ctr Houston, Houston, TX 77030 USA
基金
中国国家自然科学基金;
关键词
Electronic medical records; Longitudinal data; Irregular data; Preprocessing pipeline; Deep learning; PREDICTION; SEPSIS; MODEL;
D O I
10.1016/j.cmpb.2024.108521
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the Electronic Medical Record Longitudinal Irregular Data Preprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability. Materials and Methods: EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMRLIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks. Results: In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for inhospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups. Conclusion: EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.
引用
收藏
页数:21
相关论文
共 29 条
  • [21] The Contribution of Social and Environmental Factors for Lung Function Decline in Chronic Obstructive Pulmonary Disease: Longitudinal Analysis of Electronic Medical Records Data
    Oates, G. R.
    Baker, E.
    Juarez, L.
    Blair, J.
    Brooks, M.
    Hossain, M.
    Nassel, A.
    Parekh, T. M.
    Mkorombindo, T.
    Dransfield, M. T.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2023, 207
  • [22] An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication
    Olivier Morin
    Martin Vallières
    Steve Braunstein
    Jorge Barrios Ginart
    Taman Upadhaya
    Henry C. Woodruff
    Alex Zwanenburg
    Avishek Chatterjee
    Javier E. Villanueva-Meyer
    Gilmer Valdes
    William Chen
    Julian C. Hong
    Sue S. Yom
    Timothy D. Solberg
    Steffen Löck
    Jan Seuntjens
    Catherine Park
    Philippe Lambin
    Nature Cancer, 2021, 2 : 709 - 722
  • [23] An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication
    Morin, Olivier
    Vallieres, Martin
    Braunstein, Steve
    Ginart, Jorge Barrios
    Upadhaya, Taman
    Woodruff, Henry C.
    Zwanenburg, Alex
    Chatterjee, Avishek
    Villanueva-Meyer, Javier E.
    Valdes, Gilmer
    Chen, William
    Hong, Julian C.
    Yom, Sue S.
    Solberg, Timothy D.
    Lock, Steffen
    Seuntjens, Jan
    Park, Catherine
    Lambin, Philippe
    NATURE CANCER, 2021, 2 (07) : 709 - +
  • [24] NORDIC LONGITUDINAL DATA FROM ELECTRONIC MEDICAL RECORDS AND FULL POPULATION NATIONAL REGISTERS: UNIQUE OPPORTUNITIES FOR NEW INSIGHTS IN BENEFIT OF DIABETES PATIENTS
    Lindh, A.
    Persson, F.
    Sobocki, P.
    Bodegard, J.
    Lindarck, N.
    VALUE IN HEALTH, 2015, 18 (07) : A726 - A726
  • [25] EXTRACTING AND USING DATA FROM ELECTRONIC MEDICAL RECORDS (EMR) TO MONITOR QUALITY OF CARE AND PRESCRIPTION PATTERNS FOR DIABETES PREVENTION AND CONTROL IN OUTPATIENT CLINICS OF LOW AND MID RESOURCES COUNTRIES: THE CASE OF COLIMA, MEXICO
    Hernandez-Avila, J. E.
    Lara, A.
    Morales-Carmona, E.
    Espinoza, E. G.
    Anaya, P.
    Palacio-Mejia, L. S.
    VALUE IN HEALTH, 2015, 18 (07) : A811 - A811
  • [26] Exploratory analysis of patient characteristics and treatment duration in patients using esketamine for treatmentresistant depression: A large longitudinal study linking commercial claims data to electronic medical records
    Xiong, Xiaomo
    Liu, Xinyue
    DiBello, Julia
    Li, Sam
    Lu, Kevin
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 : 312 - 313
  • [27] Developing and testing a framework for coding general practitioners' free-text diagnoses in electronic medical records - a reliability study for generating training data in natural language processing
    Wallnofer, Audrey
    Burgstaller, Jakob M.
    Weiss, Katja
    Rosemann, Thomas
    Senn, Oliver
    Markun, Stefan
    BMC PRIMARY CARE, 2024, 25 (01):
  • [28] Liver-related clinical events among adult patients with alpha-1 antitrypsin deficiency-associated liver disease: a longitudinal retrospective study using linked insurance claims data and electronic medical records in the United States
    Hagiwara, May
    Divino, Victoria
    Munnangi, Swapna
    Delegge, Mark
    Park, Suna
    Marins, Ed G.
    Ren, Kaili
    Strange, Charlton
    JOURNAL OF HEPATOLOGY, 2023, 78 : S977 - S978
  • [29] Healthcare utilization in older patients using personal emergency response systems: An analysis of electronic health records and medical alert data: Brief Description: A Longitudinal Retrospective Analyses of healthcare utilization rates in older patients using Personal Emergency Response Systems from 2011 to 2015
    Agboola S.
    Golas S.
    Fischer N.
    Nikolova-Simons M.
    Op Den Buijs J.
    Schertzer L.
    Kvedar J.
    Jethwani K.
    BMC Health Services Research, 17 (1)