An Evaluation Framework for Synthetic Data Generation Models

被引:0
|
作者
Livieris, I. E. [1 ,2 ]
Alimpertis, N. [1 ]
Domalis, G. [1 ]
Tsakalidis, D. [1 ]
机构
[1] Novelcore, Athens 10436, Greece
[2] Univ Pireaus, Dept Stat & Insurance Sci, Piraeus, Greece
关键词
Synthetic data generator; evaluation framework; tabular data; statistical analysis;
D O I
10.1007/978-3-031-63219-8_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, the use of synthetic data has gained popularity as a cost-efficient strategy for enhancing data augmentation for improving machine learning models performance as well as addressing concerns related to sensitive data privacy. Therefore, the necessity of ensuring quality of generated synthetic data, in terms of accurate representation of real data, consists of primary importance. In this work, we present a new framework for evaluating synthetic data generation models' ability for developing high-quality synthetic data. The proposed approach is able to provide strong statistical and theoretical information about the evaluation framework and the compared models' ranking. Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.
引用
收藏
页码:320 / 335
页数:16
相关论文
共 50 条
  • [31] Unsupervised Hybrid Deep Generative Models for Photovoltaic Synthetic Data Generation
    de Jesus, Dan A. Rosa
    Mandal, Paras
    Senjyu, Tomonobu
    Kamalasadan, Sukumar
    2021 IEEE POWER & ENERGY SOCIETY GENERAL MEETING (PESGM), 2021,
  • [32] Syntheval: a framework for detailed utility and privacy evaluation of tabular synthetic data
    Lautrup, Anton D.
    Hyrup, Tobias
    Zimek, Arthur
    Schneider-Kamp, Peter
    DATA MINING AND KNOWLEDGE DISCOVERY, 2025, 39 (01) : 1 - 25
  • [33] Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic Review
    Osorio-Marulanda, Pablo A.
    Epelde, Gorka
    Hernandez, Mikel
    Isasa, Imanol
    Reyes, Nicolas Moreno
    Iraola, Andoni Beristain
    IEEE ACCESS, 2024, 12 : 88048 - 88074
  • [34] Introduction to the Special Issue on Realistic Synthetic Data: Generation, Learning, Evaluation
    Ionescu, Bogdan
    Patras, Ioannis
    Muller, Henning
    Del Bimbo, Alberto
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (01) : 1 - 7
  • [35] Systematic Generation and Evaluation of Synthetic Production Data for Industry 5.0 Optimization
    Liaskovska, Solomiia
    Tyskyi, Sviatoslav
    Martyn, Yevgen
    Augousti, Andy T.
    Kulyk, Volodymyr
    TECHNOLOGIES, 2025, 13 (02)
  • [36] A code generation framework for actor-oriented models with partial evaluation
    Zhou, Gang
    Leung, Man-Kit
    Lee, Edward A.
    EMBEDDED SOFTWARE AND SYSTEMS, PROCEEDINGS, 2007, 4523 : 193 - +
  • [37] Evaluation of Synthetic Data Generation Techniques in the Domain of Anonymous Traffic Classification
    Cullen, Drake
    Halladay, James
    Briner, Nathan
    Basnet, Ram
    Bergen, Jeremy
    Doleck, Tenzin
    IEEE ACCESS, 2022, 10 : 129612 - 129625
  • [38] Evaluation of Synthetic Data Generation Techniques in the Domain of Anonymous Traffic Classification
    Cullen, Drake
    Halladay, James
    Briner, Nathan
    Basnet, Ram
    Bergen, Jeremy
    Doleck, Tenzin
    IEEE Access, 2022, 10 : 129612 - 129625
  • [39] SDGnE: A Synthetic Data Generation and Evaluation System for Rare Event Prediction
    Bae, Wan D.
    Alkobaisi, Shayma
    Bhuvajil, Sartaj
    Bankar, Siddheshwari
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT VII, DASFAA 2024, 2024, 14856 : 508 - 512
  • [40] On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
    Long, Lin
    Wang, Rui
    Xiao, Ruixuan
    Zhao, Junbo
    Ding, Xiao
    Chen, Gang
    Wang, Haobo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11065 - 11082