Tabular data synthesis with generative adversarial networks: design space and optimizations

被引:6
|
作者
Liu, Tongyu [1 ]
Fan, Ju [1 ]
Li, Guoliang [2 ]
Tang, Nan [3 ]
Du, Xiaoyong [1 ]
机构
[1] Renmin Univ China, Beijing 100872, Peoples R China
[2] Tsinghua Univ, Beijing 100084, Peoples R China
[3] HKUST GZ, Guangzhou 511455, Peoples R China
来源
VLDB JOURNAL | 2024年 / 33卷 / 02期
关键词
Tabular data synthesis; Generative adversarial networks; GAN optimizations; Data privacy; PRIVACY;
D O I
10.1007/s00778-023-00807-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of big data has brought an urgent demand for privacy-preserving data publishing. Traditional solutions to this demand have limitations on effectively balancing the trade-off between privacy and utility of the released data. To address this problem, the database community and machine learning community have recently studied a new problem of tabular data synthesis using generative adversarial networks (GANs) and proposed various algorithms. However, a comprehensive comparison between GAN-based methods and conventional approaches is still lacking, making it unclear why and how GANs can outperform conventional approaches in synthesizing tabular data. Moreover, it is difficult for practitioners to understand which components are necessary when building a GAN model for tabular data synthesis. To bridge this gap, we conduct a comprehensive experimental study that investigates applying GAN to tabular data synthesis. We introduce a unified GAN-based framework and define a space of design solutions for each component in the framework, including neural network architectures and training strategies. We provide optimization techniques to handle difficulties in training GAN in practice. We conduct extensive experiments to explore the design space, comparing with traditional data synthesis approaches. Through extensive experiments, we find that GAN is very promising for tabular data synthesis and provide guidance for selecting appropriate design choices. We also point out limitations of GAN and identify future research directions. We make all code and datasets public for future research.
引用
收藏
页码:255 / 280
页数:26
相关论文
共 50 条
  • [1] Tabular data synthesis with generative adversarial networks: design space and optimizations
    Tongyu Liu
    Ju Fan
    Guoliang Li
    Nan Tang
    Xiaoyong Du
    The VLDB Journal, 2024, 33 : 255 - 280
  • [2] Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration
    Fan, Ju
    Chen, Junyou
    Liu, Tongyu
    Shen, Yuwei
    Li, Guoliang
    Du, Xiaoyong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (11): : 1962 - 1975
  • [3] TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks
    Rajabi, Amirarsalan
    Garibay, Ozlem Ozmen
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (02): : 488 - 501
  • [4] Optimizations of Ternary Generative Adversarial Networks
    Nakamura, Kennichi
    Nakahara, Hiroki
    2022 IEEE 52ND INTERNATIONAL SYMPOSIUM ON MULTIPLE-VALUED LOGIC (ISMVL 2022), 2022, : 158 - 163
  • [5] Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks
    Nik, Alireza Hossein Zadeh
    Riegler, Michael A.
    Halvorsen, Pal
    Storas, Andrea M.
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 434 - 446
  • [6] Composite Travel Generative Adversarial Networks for Tabular and Sequential Population Synthesis
    Badu-Marfo, Godwin
    Farooq, Bilal
    Patterson, Zachary
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 17976 - 17985
  • [7] CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis
    Alshantti, Abdallah
    Varagnolo, Damiano
    Rasheed, Adil
    Rahmati, Aria
    Westad, Frank
    IEEE ACCESS, 2024, 12 : 13213 - 13232
  • [8] The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey
    Rick Sauber-Cole
    Taghi M. Khoshgoftaar
    Journal of Big Data, 9
  • [9] The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey
    Sauber-Cole, Rick
    Khoshgoftaar, Taghi M.
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [10] Distance Correlation GAN: Fair Tabular Data Generation with Generative Adversarial Networks
    Rajabi, Amirarsalan
    Garibay, Ozlem Ozmen
    ARTIFICIAL INTELLIGENCE IN HCI, AI-HCI 2023, PT I, 2023, 14050 : 431 - 445