An Evaluation Framework for Synthetic Data Generation Models

被引:0
|
作者
Livieris, I. E. [1 ,2 ]
Alimpertis, N. [1 ]
Domalis, G. [1 ]
Tsakalidis, D. [1 ]
机构
[1] Novelcore, Athens 10436, Greece
[2] Univ Pireaus, Dept Stat & Insurance Sci, Piraeus, Greece
关键词
Synthetic data generator; evaluation framework; tabular data; statistical analysis;
D O I
10.1007/978-3-031-63219-8_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, the use of synthetic data has gained popularity as a cost-efficient strategy for enhancing data augmentation for improving machine learning models performance as well as addressing concerns related to sensitive data privacy. Therefore, the necessity of ensuring quality of generated synthetic data, in terms of accurate representation of real data, consists of primary importance. In this work, we present a new framework for evaluating synthetic data generation models' ability for developing high-quality synthetic data. The proposed approach is able to provide strong statistical and theoretical information about the evaluation framework and the compared models' ranking. Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.
引用
收藏
页码:320 / 335
页数:16
相关论文
共 50 条
  • [1] Replicant™ framework for synthetic data generation
    Kenul, Emily
    Black, Margaret
    Massey, Drew
    Havelka, Zachary
    Henkai, Mawia
    Gavin, Kyle
    Shellhorn, Luke
    SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [2] Synthetic data generation by diffusion models
    Zhu, Jun
    NATIONAL SCIENCE REVIEW, 2024, 11 (08)
  • [3] Synthetic data generation by diffusion models
    Jun Zhu
    National Science Review, 2024, 11 (08) : 19 - 21
  • [4] GeMSyD: Generic Framework for Synthetic Data Generation
    Tolas, Ramona
    Portase, Raluca
    Potolea, Rodica
    DATA, 2024, 9 (01)
  • [5] Realistic Synthetic Data Generation: The ATEN Framework
    McLachlan, Scott
    Dube, Kudakwashe
    Gallagher, Thomas
    Simmonds, Jennifer A.
    Fenton, Norman
    BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, BIOSTEC 2018, 2019, 1024 : 497 - 523
  • [6] A Synthetic Data Generation Framework for Grounded Dialogues
    Bao, Jianzhu
    Wang, Rui
    Wang, Yasheng
    Sun, Aixin
    Li, Yitong
    Mi, Fei
    Xu, Ruifeng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10866 - 10882
  • [7] Generation and evaluation of synthetic patient data
    Goncalves, Andre
    Ray, Priyadip
    Soper, Braden
    Stevens, Jennifer
    Coyle, Linda
    Sales, Ana Paula
    BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
  • [8] Generation and evaluation of synthetic patient data
    Andre Goncalves
    Priyadip Ray
    Braden Soper
    Jennifer Stevens
    Linda Coyle
    Ana Paula Sales
    BMC Medical Research Methodology, 20
  • [9] Generation and evaluation of medical synthetic data
    Goncalves, Andre R.
    Ray, Priyadip
    Soper, Braden
    Myneni, Madhumita
    Stevens, Jennifer L.
    Coyle, Linda M.
    Sales, Ana Paula
    CANCER RESEARCH, 2019, 79 (13)
  • [10] A Software Framework for Synthetic Aeronautical Data Traffic Generation in Support of LDACS Evaluation Activities
    Jansen, Leonardus J. A.
    Graeupl, Thomas
    Maeurer, Nils
    Morioka, Kazuyuki
    Schmitt, Corinna
    2023 INTEGRATED COMMUNICATION, NAVIGATION AND SURVEILLANCE CONFERENCE, ICNS, 2023,