A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets

被引:8
|
作者
Bernardini, Michele [1 ]
Doinychko, Anastasiia [4 ]
Romeo, Luca [2 ]
Frontoni, Emanuele [3 ]
Amini, Massih-Reza [4 ]
机构
[1] Univ Politecn Marche, Dept Informat Engn DII, Ancona, Italy
[2] Univ Macerata, Dept Econ & Law, Macerata, Italy
[3] Univ Macerata, Dept Polit Sci Commun & Int Relat, Macerata, Italy
[4] Univ Grenoble Alpes, Grenoble Informat Lab, St Martin Dheres, France
关键词
Data imputation; Generative Adversarial Network; Electronic Health Record; Machine Learning; Predictive medicine; TIME-SERIES; MODEL;
D O I
10.1016/j.compbiomed.2023.107188
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The missing data mechanism is a relevant problem in Machine Learning (ML) and biomedical informatics communities. Real-world Electronic Health Record (EHR) datasets comprise several missing values, thus revealing a high level of spatiotemporal sparsity in the predictors' matrix. Several approaches in the state-of-the-art tried to deal with this problem by proposing different data imputation strategies that (i) are often unrelated to the ML model, (ii) are not conceived for EHR data where laboratory exams are not prescribed uniformly over time and percentage of missing values is high (iii) exploit only univariate and linear information on the observed features. Our paper proposes a data imputation strategy based on a clinical conditional Generative Adversarial Network (ccGAN) capable of imputing missing values by exploiting non-linear and multivariate information across patients. Unlike other GAN data imputation-based approaches, our method deals explicitly with the high level of missingness of routine EHR data by conditioning the imputing strategy to the observable values and those fully-annotated. We demonstrated the statistical significance of the ccGAN to other state-of-the-art approaches in terms of imputation (around 19.79% of gain to the best competitor) and predictive performance (up to 1.60% of gain to the best competitor) on a real multi-diabetic centers dataset. We also demonstrated its robustness across different missingness rates (up to 1.61% of gain to the best competitor in the highest missingness rates condition) on an additional benchmark EHR dataset.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] A Novel Method for Imputing Missing Values in Ship Static Data Based on Generative Adversarial Networks
    Gao, Junbo
    Cai, Ze
    Sun, Wei
    Jiao, Yingqi
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (04)
  • [42] GB-GAIN: Granular-ball conditional generative adversarial imputation networks for incomplete data
    Hao, Zepu
    Pei, Shenglei
    Han, Qinghao
    Ai, Runqi
    PROCEEDINGS OF THE 2024 6TH INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING, BDE 2024, 2024, : 48 - 53
  • [43] PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data
    Wang, Yufeng
    Li, Dan
    Li, Xiang
    Yang, Min
    NEURAL NETWORKS, 2021, 141 : 395 - 403
  • [44] MIC: Multi-view Image Classifier using Generative Adversarial Networks for Missing Data Imputation
    Aversano, Gianmarco
    Jarraya, Mahmoud
    Marwani, Maher
    Lahouli, Ichraf
    Skhiri, Sabri
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 283 - 288
  • [45] Bidirectional Stackable Recurrent Generative Adversarial Imputation Network for Specific Emitter Missing Data Imputation
    Li, Haozhe
    Liao, Yilin
    Tian, Zijian
    Liu, Zhaoran
    Liu, Jiaqi
    Liu, Xinggao
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 2967 - 2980
  • [46] Multiple Imputation by Generative Adversarial Networks for Classification with Incomplete Data
    Bao Ngoc Vi
    Dinh Tan Nguyen
    Cao Truong Tran
    Huu Phuc Ngo
    Chi Cong Nguyen
    Hai-Hong Phan
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 162 - 167
  • [47] Contextual Imputation With Missing Sequence of EEG Signals Using Generative Adversarial Networks
    Lee, Woonghee
    Lee, Jaeyoung
    Kim, Younghoon
    IEEE ACCESS, 2021, 9 : 151753 - 151765
  • [48] A Gated Generative Adversarial Imputation Approach for Signalized Road Networks
    Zhang, Tong
    Wang, Jianlong
    Liu, Jie
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) : 12144 - 12160
  • [49] Missing data imputation in a transformer district based on time series imagingencoding and a generative adversarial network
    Liu K.
    Zhou F.
    Zhou H.
    Dianli Xitong Baohu yu Kongzhi/Power System Protection and Control, 2022, 50 (24): : 129 - 136
  • [50] Interpolating Seismic Data With Conditional Generative Adversarial Networks
    Oliveira, Dario A. B.
    Ferreira, Rodrigo S.
    Silva, Reinaldo
    Brazil, Emilio Vital
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2018, 15 (12) : 1952 - 1956