Utility Meets Privacy: A Critical Evaluation of Tabular Data Synthesizers

被引:0
|
作者
Hoellig, Julian [1 ]
Geierhos, Michaela [1 ]
机构
[1] Univ Bundeswehr Munich, Res Inst CODE, D-85577 Neubiberg, Germany
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Synthesizers; Synthetic data; Data privacy; Machine learning; Data models; Privacy; Deep learning; Accuracy; Protection; Predictive models; Membership inference analysis; privacy evaluation; tabular data synthesizer; utility-privacy trade-off; SYNTHETIC DATA;
D O I
10.1109/ACCESS.2025.3549680
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Evaluating synthetic data requires careful consideration of both utility and privacy. This study analyzes 12 synthesizers across 17 tabular health datasets, providing large-scale, comparable evaluation results. A novel utility-privacy score integrates a privacy measure into the evaluation, quantifying the trade-off between the two. Membership inference analysis is extended for robust privacy assessment, with reusable code provided for further research. Key findings include: 1) despite its simplicity, SMOTE achieves the best results of all synthesizers, but no single synthesizer consistently outperforms all others across all datasets; 2) utility and privacy are inherently correlated, with improvements in one compromising the other; 3) some synthesizers exhibit greater robustness to variations in data properties, such as sample and feature size, and some properties have a stronger impact than others. These findings underscore how closely the utility of a synthesizer is tied to individual datasets and privacy considerations, and highlight the importance of incorporating these aspects into future research and adopting broader, more diverse evaluation frameworks.
引用
收藏
页码:44497 / 44509
页数:13
相关论文
共 50 条
  • [1] Syntheval: a framework for detailed utility and privacy evaluation of tabular synthetic data
    Lautrup, Anton D.
    Hyrup, Tobias
    Zimek, Arthur
    Schneider-Kamp, Peter
    DATA MINING AND KNOWLEDGE DISCOVERY, 2025, 39 (01) : 1 - 25
  • [2] Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions
    Hernadez, Mikel
    Epelde, Gorka
    Alberdi, Ane
    Cilla, Rodrigo
    Rankin, Debbie
    METHODS OF INFORMATION IN MEDICINE, 2023, 62 : e19 - e38
  • [3] GANs for Tabular Healthcare Data Generation: A Review on Utility and Privacy
    Coutinho-Almeida, Joao
    Rodrigues, Pedro Pereira
    Cruz-Correia, Ricardo Joao
    DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 282 - 291
  • [4] Further Insights: Balancing Privacy, Explainability, and Utility in Machine Learning-based Tabular Data Analysis
    Abbasi, Wisam
    Mori, Paolo
    Saracino, Andrea
    19TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY, AND SECURITY, ARES 2024, 2024,
  • [5] Privacy-preserving tabular data publishing: A comprehensive evaluation from web to cloud
    Abdelhameed, Saad A.
    Moussa, Sherin M.
    Khalifa, Mohamed E.
    COMPUTERS & SECURITY, 2018, 72 : 74 - 95
  • [6] The Explainability-Privacy-Utility Trade-Off for Machine Learning-Based Tabular Data Analysis
    Abbasi, Wisam
    Mori, Paolo
    Saracino, Andrea
    PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, SECRYPT 2023, 2023, : 511 - 519
  • [7] Sharing is CAIRing: Characterizing principles and assessing properties of universal privacy evaluation for synthetic tabular data
    Hyrup, Tobias
    Lautrup, Anton Danholt
    Zimek, Arthur
    Schneider-Kamp, Peter
    MACHINE LEARNING WITH APPLICATIONS, 2024, 18
  • [8] Scaling While Privacy Preserving: A Comprehensive Synthetic Tabular Data Generation and Evaluation in Learning Analytics
    Liu, Qinyi
    Khalil, Mohammad
    Shakya, Ronas
    Jovanovic, Jelena
    FOURTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2024, 2024, : 620 - 631
  • [9] Statistical Data Privacy: A Song of Privacy and Utility
    Slavkovic, Aleksandra
    Seeman, Jeremy
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2023, 10 : 189 - 218
  • [10] Synthesizing High-Utility Tabular Data with Enhanced Privacy via Split-and-Discard Pre-training
    Luo, Liwei
    Huang, Heyuan
    Zhang, Bingbing
    Xie, Yankai
    Zhang, Chi
    Wei, Lingbo
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 6012 - 6017