Deep Generative Models for Synthetic Data: A Survey

被引:22
|
作者
Eigenschink, Peter [1 ]
Reutterer, Thomas [1 ]
Vamosi, Stefan [1 ]
Vamosi, Ralf [1 ,2 ]
Sun, Chang [3 ]
Kalcher, Klaudius [4 ]
机构
[1] Vienna Univ Econ & Business, Dept Mkt, A-1020 Vienna, Austria
[2] Vienna Univ Technol, High Performance Comp, A-1040 Vienna, Austria
[3] Maastricht Univ, Inst Data Sci, NL-6200 MD Maastricht, Netherlands
[4] Mostly AI GmbH, A-1030 Vienna, Austria
关键词
Data models; Synthetic data; Measurement; Biological system modeling; Analytical models; Training data; Medical services; Artificial intelligence; big data; deep learning; generative models; neural networks; synthetic data; privacy; NATURAL-LANGUAGE GENERATION; PREDICTION;
D O I
10.1109/ACCESS.2023.3275134
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A growing interest in synthetic data has stimulated the development and advancement of a large variety of deep generative models for a wide range of applications. However, as this research has progressed, its streams have become more specialized and disconnected from one another. This is why models for synthesizing text data for natural language processing cannot readily be compared to models for synthesizing health records anymore. To mitigate this isolation, we propose a data-driven evaluation framework for generative models for synthetic sequential data, an important and challenging sub-category of synthetic data, based on five high-level criteria: representativeness, novelty, realism, diversity and coherence of a synthetic data-set relative to the original data-set regardless of the models' internal structures. The criteria reflect requirements different domains impose on synthetic data and allow model users to assess the quality of synthetic data across models. In a critical review of generative models for sequential data, we examine and compare the importance of each performance criterion in numerous domains. We find that realism and coherence are more important for synthetic data natural language, speech and audio processing tasks. At the same time, novelty and representativeness are more important for healthcare and mobility data. We also find that measurement of representativeness is often accomplished using statistical metrics, realism by using human judgement, and novelty using privacy tests.
引用
收藏
页码:47304 / 47320
页数:17
相关论文
共 50 条
  • [41] Deep generative models in DataSHIELD
    Lenz, Stefan
    Hess, Moritz
    Binder, Harald
    BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)
  • [42] Denoising Deep Generative Models
    Loaiza-Ganem, Gabriel
    Ross, Brendan Leigh
    Wu, Luhuan
    Cunningham, John P.
    Cresswell, Jesse C.
    Caterini, Anthony L.
    PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER! - UNDERSTANDING DEEP LEARNING THROUGH EMPIRICAL FALSIFICATION, VOL 187, 2022, 187 : 41 - 50
  • [43] An Overview of Deep Generative Models
    Xu, Jungang
    Li, Hui
    Zhou, Shilong
    IETE TECHNICAL REVIEW, 2015, 32 (02) : 131 - 139
  • [44] A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
    Suh, Namjoon
    Cheng, Guang
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2025, 12 : 177 - 207
  • [45] A Survey of Unsupervised Generative Models for Exploratory Data Analysis and Representation Learning
    Abukmeil, Mohanad
    Ferrari, Stefano
    Genovese, Angelo
    Piuri, Vincenzo
    Scotti, Fabio
    ACM COMPUTING SURVEYS, 2021, 54 (05)
  • [46] AUGMENTING MOLECULAR DEEP GENERATIVE MODELS WITH TOPOLOGICAL DATA ANALYSIS REPRESENTATIONS
    Schiff, Yair
    Chenthamarakshan, Vijil
    Hoffman, Samuel C.
    Ramamurthy, Karthikeyan Natesan
    Das, Payel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3783 - 3787
  • [47] Data Augmentation for the Femoral Head Using Generative Deep Learning Models
    Won, Joon Hee
    Goh, Tae Sik
    Lee, Jung Sub
    Lim, Hee Chang
    TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS B, 2025, 49 (02) : 109 - 119
  • [48] Counterfactual image generation by disentangling data attributes with deep generative models
    Lim, Jieon
    Joo, Weonyoung
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2023, 30 (06) : 589 - 603
  • [49] Conditional Data Synthesis with Deep Generative Models for Imbalanced Dataset Oversampling
    Akritidis, Leonidas
    Fevgas, Athanasios
    Alamaniotis, Miltiadis
    Bozanis, Panayiotis
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 444 - 451
  • [50] Temporal Anomaly Detection by Deep Generative Models with Applications to Biological Data
    Ueda, Takaya
    Tohsato, Yukako
    Nishikawa, Ikuko
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 553 - 565