Deep Generative Models for Synthetic Data: A Survey

被引:22
|
作者
Eigenschink, Peter [1 ]
Reutterer, Thomas [1 ]
Vamosi, Stefan [1 ]
Vamosi, Ralf [1 ,2 ]
Sun, Chang [3 ]
Kalcher, Klaudius [4 ]
机构
[1] Vienna Univ Econ & Business, Dept Mkt, A-1020 Vienna, Austria
[2] Vienna Univ Technol, High Performance Comp, A-1040 Vienna, Austria
[3] Maastricht Univ, Inst Data Sci, NL-6200 MD Maastricht, Netherlands
[4] Mostly AI GmbH, A-1030 Vienna, Austria
关键词
Data models; Synthetic data; Measurement; Biological system modeling; Analytical models; Training data; Medical services; Artificial intelligence; big data; deep learning; generative models; neural networks; synthetic data; privacy; NATURAL-LANGUAGE GENERATION; PREDICTION;
D O I
10.1109/ACCESS.2023.3275134
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A growing interest in synthetic data has stimulated the development and advancement of a large variety of deep generative models for a wide range of applications. However, as this research has progressed, its streams have become more specialized and disconnected from one another. This is why models for synthesizing text data for natural language processing cannot readily be compared to models for synthesizing health records anymore. To mitigate this isolation, we propose a data-driven evaluation framework for generative models for synthetic sequential data, an important and challenging sub-category of synthetic data, based on five high-level criteria: representativeness, novelty, realism, diversity and coherence of a synthetic data-set relative to the original data-set regardless of the models' internal structures. The criteria reflect requirements different domains impose on synthetic data and allow model users to assess the quality of synthetic data across models. In a critical review of generative models for sequential data, we examine and compare the importance of each performance criterion in numerous domains. We find that realism and coherence are more important for synthetic data natural language, speech and audio processing tasks. At the same time, novelty and representativeness are more important for healthcare and mobility data. We also find that measurement of representativeness is often accomplished using statistical metrics, realism by using human judgement, and novelty using privacy tests.
引用
收藏
页码:47304 / 47320
页数:17
相关论文
共 50 条
  • [31] Generating Synthetic Tabular Data for DDoS Detection Using Generative Models
    Saka, Samed
    Al-Ataby, Ali
    Selis, Valerio
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 1436 - 1442
  • [32] Survey on Deep Generative Model
    Hu, Ming-Fei
    Zuo, Xin
    Liu, Jian-Wei
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (01): : 40 - 74
  • [33] Invisible Threats in the Data: A Study on Data Poisoning Attacks in Deep Generative Models
    Yang, Ziying
    Zhang, Jie
    Wang, Wei
    Li, Huan
    APPLIED SCIENCES-BASEL, 2024, 14 (19):
  • [34] Diversity in Deep Generative Models and Generative AI
    Turinici, Gabriel
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT II, 2024, 14506 : 84 - 93
  • [35] A Survey on Generative Diffusion Models
    Cao, Hanqun
    Tan, Cheng
    Gao, Zhangyang
    Xu, Yilun
    Chen, Guangyong
    Heng, Pheng-Ann
    Li, Stan Z.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 2814 - 2830
  • [36] Deep generative models in DataSHIELD
    Stefan Lenz
    Moritz Hess
    Harald Binder
    BMC Medical Research Methodology, 21
  • [37] Learning Deep Generative Models
    Salakhutdinov, Ruslan
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 2, 2015, 2 : 361 - 385
  • [38] Metrics for Deep Generative Models
    Chen, Nutan
    Klushyn, Alexej
    Kurle, Richard
    Jiang, Xueyan
    Bayer, Justin
    van der Smagt, Patrick
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [39] Asymmetric deep generative models
    Partaourides, Harris
    Chatzis, Sotirios P.
    NEUROCOMPUTING, 2017, 241 : 90 - 96
  • [40] Auxiliary Deep Generative Models
    Maaloe, Lars
    Sonderby, Casper Kaae
    Sonderby, Soren Kaae
    Winther, Ole
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48