Deep Generative Models for Synthetic Data: A Survey

被引:22
|
作者
Eigenschink, Peter [1 ]
Reutterer, Thomas [1 ]
Vamosi, Stefan [1 ]
Vamosi, Ralf [1 ,2 ]
Sun, Chang [3 ]
Kalcher, Klaudius [4 ]
机构
[1] Vienna Univ Econ & Business, Dept Mkt, A-1020 Vienna, Austria
[2] Vienna Univ Technol, High Performance Comp, A-1040 Vienna, Austria
[3] Maastricht Univ, Inst Data Sci, NL-6200 MD Maastricht, Netherlands
[4] Mostly AI GmbH, A-1030 Vienna, Austria
关键词
Data models; Synthetic data; Measurement; Biological system modeling; Analytical models; Training data; Medical services; Artificial intelligence; big data; deep learning; generative models; neural networks; synthetic data; privacy; NATURAL-LANGUAGE GENERATION; PREDICTION;
D O I
10.1109/ACCESS.2023.3275134
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A growing interest in synthetic data has stimulated the development and advancement of a large variety of deep generative models for a wide range of applications. However, as this research has progressed, its streams have become more specialized and disconnected from one another. This is why models for synthesizing text data for natural language processing cannot readily be compared to models for synthesizing health records anymore. To mitigate this isolation, we propose a data-driven evaluation framework for generative models for synthetic sequential data, an important and challenging sub-category of synthetic data, based on five high-level criteria: representativeness, novelty, realism, diversity and coherence of a synthetic data-set relative to the original data-set regardless of the models' internal structures. The criteria reflect requirements different domains impose on synthetic data and allow model users to assess the quality of synthetic data across models. In a critical review of generative models for sequential data, we examine and compare the importance of each performance criterion in numerous domains. We find that realism and coherence are more important for synthetic data natural language, speech and audio processing tasks. At the same time, novelty and representativeness are more important for healthcare and mobility data. We also find that measurement of representativeness is often accomplished using statistical metrics, realism by using human judgement, and novelty using privacy tests.
引用
收藏
页码:47304 / 47320
页数:17
相关论文
共 50 条
  • [11] Synthetic observations from deep generative models and binary omics data with limited sample size
    Nussberger, Jens
    Boesel, Frederic
    Lenz, Stefan
    Binder, Harald
    Hess, Moritz
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [12] Synthetic Aperture Radar Image Generation With Deep Generative Models
    Wang, Ke
    Zhang, Gong
    Leng, Yang
    Leung, Henry
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (06) : 912 - 916
  • [13] Generative models for synthetic data generation: application to pharmacokinetic/pharmacodynamic data
    Jiang, Yulun
    Garcia-Duran, Alberto
    Losada, Idris Bachali
    Girard, Pascal
    Terranova, Nadia
    JOURNAL OF PHARMACOKINETICS AND PHARMACODYNAMICS, 2024, 51 (06) : 877 - 885
  • [14] Deep Generative Models for Relational Data with Side Information
    Hu, Changwei
    Rai, Piyush
    Carin, Lawrence
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [15] On oversampling imbalanced data with deep conditional generative models
    Fajardo, Val Andrei
    Findlay, David
    Jaiswal, Charu
    Yin, Xinshang
    Houmanfar, Roshanak
    Xie, Honglei
    Liang, Jiaxi
    She, Xichen
    Emerson, D. B.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169 (169)
  • [16] A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity
    Agrawal, Garima
    Kaur, Amardeep
    Myneni, Sowmya
    ELECTRONICS, 2024, 13 (02)
  • [17] Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
    Treppner, Martin
    Salas-Bastos, Adrian
    Hess, Moritz
    Lenz, Stefan
    Vogel, Tanja
    Binder, Harald
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [18] Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
    Martin Treppner
    Adrián Salas-Bastos
    Moritz Hess
    Stefan Lenz
    Tanja Vogel
    Harald Binder
    Scientific Reports, 11
  • [19] Synthetic data generation in motion analysis: A generative deep learning framework
    Perrone, Mattia
    Mell, Steven P.
    Martin, John T.
    Nho, Shane J.
    Simmons, Scott
    Malloy, Philip
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART H-JOURNAL OF ENGINEERING IN MEDICINE, 2025, 239 (02) : 202 - 211
  • [20] Intelligent layout generation based on deep generative models: A comprehensive survey
    Shi, Yong
    Shang, Mengyu
    Qi, Zhiquan
    INFORMATION FUSION, 2023, 100