MISSING DATA IMPUTATION FOR HEALTH CARE BIG DATA USING DENOISING AUTOENCODER WITH GENERATIVE ADVERSARIAL NETWORK

被引:0
|
作者
Zhang, Yinbing [1 ]
机构
[1] Hubu Univ, Coll Chem & Chem Engn, Wuhan 430062, Hubei, Peoples R China
来源
关键词
Data imputation; missing data; Autoencoders; GAN; Deep learning;
D O I
10.12694/scpe.v25i5.3023
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Missing data imputation is a key topic in healthcare that covers the issues and strategies involved in dealing with partial data in medical records, clinical trials, and health surveys. Data in healthcare might be missing for a variety of reasons, including non-response in surveys, data entry problems, or unrecorded information during therapeutic appointments. This paper introduces a novel approach to impute missing data utilizing a hybrid model that integrates denoising autoencoders with generative adversarial networks (GANs). We begin by highlighting the prevalence of missing data in health care datasets and the potential impact on analytical outcomes. The proposed methodology leverages the denoising autoencoder's ability to reconstruct data from noisy inputs, coupled with the GAN's proficiency in generating synthetic data that is indistinguishable from real data. By combining these two neural network architectures, our model demonstrates an enhanced capability to predict and fill in missing data points effectively. To validate our approach, we conducted experiments on several large-scale health care datasets with varying degrees of artificially introduced missingness. The performance of our model was benchmarked against traditional imputation methods such as mean imputation and k-nearest neighbors, as well as against standalone denoising autoencoders and GANs. Our results indicate a significant improvement in imputation accuracy, as measured by root mean square error (RMSE) and mean absolute error (MAE), confirming the efficacy of the hybrid model in handling missing data in a robust manner.
引用
收藏
页码:3850 / 3857
页数:8
相关论文
共 50 条
  • [31] Ensemble Generative Adversarial Imputation Network with Selective Multi-Generator (ESM-GAIN) for Missing Data Imputation
    Li, Yuxuan
    Dogan, Ayse
    Liu, Chenang
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 807 - 812
  • [32] A Missing Traffic Data Imputation Method Based on a Diffusion Convolutional Neural Network-Generative Adversarial Network
    Zhang, Chenchen
    Zhou, Lei
    Xiao, Xuemei
    Xu, Dongwei
    SENSORS, 2023, 23 (23)
  • [33] Generative Adversarial Networks Assist Missing Data Imputation: A Comprehensive Survey and Evaluation
    Shahbazian, Reza
    Greco, Sergio
    IEEE ACCESS, 2023, 11 : 88908 - 88928
  • [34] Imputation of Missing Values in Training Data using Variational Autoencoder
    Hong, Xuerui
    Hao, Shuang
    2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW, 2023, : 49 - 54
  • [35] Establishing strong imputation performance of a denoising autoencoder in a wide range of missing data problems
    Abiri, Najmeh
    Linse, Bjorn
    Eden, Patrik
    Ohlsson, Mattias
    NEUROCOMPUTING, 2019, 365 : 137 - 146
  • [36] Missing data imputation framework for bridge structural health monitoring based on slim generative adversarial networks
    Gao, Shuai
    Zhao, Wenlong
    Wan, Chunfeng
    Jiang, Huachen
    Ding, Youliang
    Xue, Songtao
    MEASUREMENT, 2022, 204
  • [37] Multi-Modal Stacked Denoising Autoencoder for Handling Missing Data in Healthcare Big Data
    Kim, Joo-Chang
    Chung, Kyungyong
    IEEE ACCESS, 2020, 8 : 104933 - 104943
  • [39] Multistate time series imputation using generative adversarial network with applications to traffic data
    Haitao Li
    Qian Cao
    Qiaowen Bai
    Zhihui Li
    Hongyu Hu
    Neural Computing and Applications, 2023, 35 : 6545 - 6567
  • [40] QAR Data Imputation Using Generative Adversarial Network with Self-Attention Mechanism
    Zhao, Jingqi
    Rong, Chuitian
    Dang, Xin
    Sun, Huabo
    BIG DATA MINING AND ANALYTICS, 2024, 7 (01): : 12 - 28