Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks

被引:4
|
作者
Nik, Alireza Hossein Zadeh [1 ,2 ]
Riegler, Michael A. [1 ,3 ]
Halvorsen, Pal [1 ,4 ]
Storas, Andrea M. [1 ,4 ]
机构
[1] SimulaMet, Oslo, Norway
[2] Univ Stavanger, Stavanger, Norway
[3] Univ Tromso, Tromso, Norway
[4] OsloMet, Oslo, Norway
来源
关键词
Synthetic data generation; Deep learning; Medical data;
D O I
10.1007/978-3-031-27077-2_34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-quality tabular data is a crucial requirement for developing data-driven applications, especially healthcare-related ones, because most of the data nowadays collected in this context is in tabular form. However, strict data protection laws complicates the access to medical datasets. Thus, synthetic data has become an ideal alternative for data scientists and healthcare professionals to circumvent such hurdles. Although many healthcare institutions still use the classical de-identification and anonymization techniques for generating synthetic data, deep learning-based generative models such as generative adversarial networks (GANs) have shown a remarkable performance in generating tabular datasets with complex structures. This paper examines the GANs' potential and applicability within the healthcare industry, which often faces serious challenges with insufficient training data and patient records sensitivity. We investigate several state-of-the-art GAN-based models proposed for tabular synthetic data generation. Healthcare datasets with different sizes, numbers of variables, column data types, feature distributions, and inter-variable correlations are examined. Moreover, a comprehensive evaluation framework is defined to evaluate the quality of the synthetic records and the viability of each model in preserving the patients' privacy. The results indicate that the proposed models can generate synthetic datasets that maintain the statistical characteristics, model compatibility and privacy of the original data. Moreover, synthetic tabular healthcare datasets can be a viable option in many data-driven applications. However, there is still room for further improvements in designing a perfect architecture for generating synthetic tabular data.
引用
收藏
页码:434 / 446
页数:13
相关论文
共 50 条
  • [21] Generation of synthetic ground glass nodules using generative adversarial networks (GANs)
    Zhixiang Wang
    Zhen Zhang
    Ying Feng
    Lizza E. L. Hendriks
    Razvan L. Miclea
    Hester Gietema
    Janna Schoenmaekers
    Andre Dekker
    Leonard Wee
    Alberto Traverso
    European Radiology Experimental, 6
  • [22] Generation of Synthetic Ampacity and Electricity Pool Prices using Generative Adversarial Networks
    Avkhimenia, Vadim
    Weis, Tim
    Musilek, Petr
    2021 IEEE ELECTRICAL POWER AND ENERGY CONFERENCE (EPEC), 2021, : 225 - 230
  • [23] Creation of Synthetic Data with Conditional Generative Adversarial Networks
    Vega-Marquez, Belen
    Rubio-Escudero, Cristina
    Riquelme, Jose C.
    Nepomuceno-Chamorro, Isabel
    14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019), 2020, 950 : 231 - 240
  • [24] The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey
    Rick Sauber-Cole
    Taghi M. Khoshgoftaar
    Journal of Big Data, 9
  • [25] Synthetic demand data generation for individual electricity consumers : Generative Adversarial Networks (GANs)
    Yilmaz, Bilgi
    Korn, Ralf
    ENERGY AND AI, 2022, 9
  • [26] The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey
    Sauber-Cole, Rick
    Khoshgoftaar, Taghi M.
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [27] Tabular transformer generative adversarial network for heterogeneous distribution in healthcare
    Ha Ye Jin Kang
    Minsam Ko
    Kwang Sun Ryu
    Scientific Reports, 15 (1)
  • [28] Synthetic lung ultrasound data generation using autoencoder with generative adversarial network
    Fatima, Noreen
    Inchingolo, Riccardo
    Smargiassi, Andrea
    Soldati, Gino
    Torri, Elena
    Perrone, Tiziano
    Demi, Libertario
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [29] Synthetic Energy Data Generation Using Time Variant Generative Adversarial Network
    Asre, Shashank
    Anwar, Adnan
    ELECTRONICS, 2022, 11 (03)
  • [30] Synthetic Data Generation via Generative Adversarial Networks in Healthcare: A Systematic Review of Image- and Signal-Based Studies
    Akpinar, Muhammed Halil
    Sengur, Abdulkadir
    Salvi, Massimo
    Seoni, Silvia
    Faust, Oliver
    Mir, Hasan
    Molinari, Filippo
    Acharya, U. Rajendra
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2025, 6 : 183 - 192