Deep Generative Models for Synthetic Data: A Survey

被引:22
|
作者
Eigenschink, Peter [1 ]
Reutterer, Thomas [1 ]
Vamosi, Stefan [1 ]
Vamosi, Ralf [1 ,2 ]
Sun, Chang [3 ]
Kalcher, Klaudius [4 ]
机构
[1] Vienna Univ Econ & Business, Dept Mkt, A-1020 Vienna, Austria
[2] Vienna Univ Technol, High Performance Comp, A-1040 Vienna, Austria
[3] Maastricht Univ, Inst Data Sci, NL-6200 MD Maastricht, Netherlands
[4] Mostly AI GmbH, A-1030 Vienna, Austria
关键词
Data models; Synthetic data; Measurement; Biological system modeling; Analytical models; Training data; Medical services; Artificial intelligence; big data; deep learning; generative models; neural networks; synthetic data; privacy; NATURAL-LANGUAGE GENERATION; PREDICTION;
D O I
10.1109/ACCESS.2023.3275134
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A growing interest in synthetic data has stimulated the development and advancement of a large variety of deep generative models for a wide range of applications. However, as this research has progressed, its streams have become more specialized and disconnected from one another. This is why models for synthesizing text data for natural language processing cannot readily be compared to models for synthesizing health records anymore. To mitigate this isolation, we propose a data-driven evaluation framework for generative models for synthetic sequential data, an important and challenging sub-category of synthetic data, based on five high-level criteria: representativeness, novelty, realism, diversity and coherence of a synthetic data-set relative to the original data-set regardless of the models' internal structures. The criteria reflect requirements different domains impose on synthetic data and allow model users to assess the quality of synthetic data across models. In a critical review of generative models for sequential data, we examine and compare the importance of each performance criterion in numerous domains. We find that realism and coherence are more important for synthetic data natural language, speech and audio processing tasks. At the same time, novelty and representativeness are more important for healthcare and mobility data. We also find that measurement of representativeness is often accomplished using statistical metrics, realism by using human judgement, and novelty using privacy tests.
引用
收藏
页码:47304 / 47320
页数:17
相关论文
共 50 条
  • [1] Deep Generative Models: Survey
    Oussidi, Achraf
    Elhassouny, Azeddine
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV2018), 2018,
  • [2] Adversarial Attacks Against Deep Generative Models on Data: A Survey
    Sun, Hui
    Zhu, Tianqing
    Zhang, Zhiqiu
    Jin, Dawei
    Xiong, Ping
    Zhou, Wanlei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3367 - 3388
  • [3] Unsupervised Hybrid Deep Generative Models for Photovoltaic Synthetic Data Generation
    de Jesus, Dan A. Rosa
    Mandal, Paras
    Senjyu, Tomonobu
    Kamalasadan, Sukumar
    2021 IEEE POWER & ENERGY SOCIETY GENERAL MEETING (PESGM), 2021,
  • [4] A survey of multimodal deep generative models
    Suzuki, Masahiro
    Matsuo, Yutaka
    Advanced Robotics, 2022, 36 (5-6): : 261 - 278
  • [5] A survey of multimodal deep generative models
    Suzuki, Masahiro
    Matsuo, Yutaka
    ADVANCED ROBOTICS, 2022, 36 (5-6) : 261 - 278
  • [6] Synthetic data generation with deep generative models to enhance predictive tasks in trading strategies
    Carvajal-Patino, Daniel
    Ramos-Pollan, Raul
    RESEARCH IN INTERNATIONAL BUSINESS AND FINANCE, 2022, 62
  • [7] Deep Generative Models in the Industrial Internet of Things: A Survey
    De, Suparna
    Bermudez-Edo, Maria
    Xu, Honghui
    Cai, Zhipeng
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (09) : 5728 - 5737
  • [8] A Systematic Survey on Deep Generative Models for Graph Generation
    Guo, Xiaojie
    Zhao, Liang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 5370 - 5390
  • [9] Deep learning with the generative models for recommender systems: A survey
    Nahta, Ravi
    Chauhan, Ganpat Singh
    Meena, Yogesh Kumar
    Gopalani, Dinesh
    COMPUTER SCIENCE REVIEW, 2024, 53
  • [10] Implications of data topology for deep generative models
    Jin, Yinzhu
    Mcdaniel, Rory
    Tatro, N. Joseph
    Catanzaro, Michael J.
    Smith, Abraham D.
    Bendich, Paul
    Dwyer, Matthew B.
    Fletcher, P. Thomas
    FRONTIERS IN COMPUTER SCIENCE, 2024, 6