A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets

被引:8
|
作者
Bernardini, Michele [1 ]
Doinychko, Anastasiia [4 ]
Romeo, Luca [2 ]
Frontoni, Emanuele [3 ]
Amini, Massih-Reza [4 ]
机构
[1] Univ Politecn Marche, Dept Informat Engn DII, Ancona, Italy
[2] Univ Macerata, Dept Econ & Law, Macerata, Italy
[3] Univ Macerata, Dept Polit Sci Commun & Int Relat, Macerata, Italy
[4] Univ Grenoble Alpes, Grenoble Informat Lab, St Martin Dheres, France
关键词
Data imputation; Generative Adversarial Network; Electronic Health Record; Machine Learning; Predictive medicine; TIME-SERIES; MODEL;
D O I
10.1016/j.compbiomed.2023.107188
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The missing data mechanism is a relevant problem in Machine Learning (ML) and biomedical informatics communities. Real-world Electronic Health Record (EHR) datasets comprise several missing values, thus revealing a high level of spatiotemporal sparsity in the predictors' matrix. Several approaches in the state-of-the-art tried to deal with this problem by proposing different data imputation strategies that (i) are often unrelated to the ML model, (ii) are not conceived for EHR data where laboratory exams are not prescribed uniformly over time and percentage of missing values is high (iii) exploit only univariate and linear information on the observed features. Our paper proposes a data imputation strategy based on a clinical conditional Generative Adversarial Network (ccGAN) capable of imputing missing values by exploiting non-linear and multivariate information across patients. Unlike other GAN data imputation-based approaches, our method deals explicitly with the high level of missingness of routine EHR data by conditioning the imputing strategy to the observable values and those fully-annotated. We demonstrated the statistical significance of the ccGAN to other state-of-the-art approaches in terms of imputation (around 19.79% of gain to the best competitor) and predictive performance (up to 1.60% of gain to the best competitor) on a real multi-diabetic centers dataset. We also demonstrated its robustness across different missingness rates (up to 1.61% of gain to the best competitor in the highest missingness rates condition) on an additional benchmark EHR dataset.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Imputation of missing data with class imbalance using conditional generative adversarial networks
    Awan, Saqib Ejaz
    Bennamoun, Mohammed
    Sohel, Ferdous
    Sanfilippo, Frank
    Dwivedi, Girish
    NEUROCOMPUTING, 2021, 453 : 164 - 171
  • [2] Improved generative adversarial imputation networks for missing data
    Qin, Xiwen
    Shi, Hongyu
    Dong, Xiaogang
    Zhang, Siqi
    Yuan, Liping
    APPLIED INTELLIGENCE, 2024, 54 (21) : 11068 - 11082
  • [3] Spatiotemporal Generative Adversarial Imputation Networks: An Approach to Address Missing Data for Wind Turbines
    Hu, Xuguang
    Zhan, Zhaokang
    Ma, Dazhong
    Zhang, Siqi
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [4] Generative adversarial learning for missing data imputation
    Xinyang Wang
    Hongyu Chen
    Jiayu Zhang
    Jicong Fan
    Neural Computing and Applications, 2025, 37 (3) : 1403 - 1416
  • [5] VALIDATION OF CONDITIONAL AND SUPERRESOLUTION GENERATIVE ADVERSARIAL NETWORKS FOR IMPUTATION OF MISSING BRAIN MRI SEQUENCES
    Conte, Gian Marco
    Tobin, W. Oliver
    Moassefi, Mana
    Faghani, Shahriar
    Decker, Paul
    Kosel, Matthew
    Nikanpour, Yalda
    Zhang, Kuan
    Lachance, Daniel Honore
    Jenkins, Robert
    Erickson, Bradley
    Eckel-Passow, Jeanette
    NEURO-ONCOLOGY, 2022, 24 : 181 - 181
  • [6] Conditional Generative Adversarial Network for Early Classification of Longitudinal Datasets Using an Imputation Approach
    Pingi, Sharon Torao
    Nayak, Richi
    Bashar, Md Abul
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (05)
  • [7] Federated conditional generative adversarial nets imputation method for air quality missing data
    Zhou, Xu
    Liu, Xiaofeng
    Lan, Gongjin
    Wu, Jian
    KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [8] VIGAN: Missing View Imputation with Generative Adversarial Networks
    Shang, Chao
    Palmer, Aaron
    Sun, Jiangwen
    Chen, Ko-Shin
    Lu, Jin
    Bi, Jinbo
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 766 - 775
  • [9] Multiple imputation method of missing credit risk assessment data based on generative adversarial networks
    Zhao, Feng
    Lu, Yan
    Li, Xinning
    Wang, Lina
    Song, Yingjie
    Fan, Deming
    Zhang, Caiming
    Chen, Xiaobo
    APPLIED SOFT COMPUTING, 2022, 126
  • [10] Generative Adversarial Networks Assist Missing Data Imputation: A Comprehensive Survey and Evaluation
    Shahbazian, Reza
    Greco, Sergio
    IEEE ACCESS, 2023, 11 : 88908 - 88928