Utility Meets Privacy: A Critical Evaluation of Tabular Data Synthesizers

被引:0
|
作者
Hoellig, Julian [1 ]
Geierhos, Michaela [1 ]
机构
[1] Univ Bundeswehr Munich, Res Inst CODE, D-85577 Neubiberg, Germany
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Synthesizers; Synthetic data; Data privacy; Machine learning; Data models; Privacy; Deep learning; Accuracy; Protection; Predictive models; Membership inference analysis; privacy evaluation; tabular data synthesizer; utility-privacy trade-off; SYNTHETIC DATA;
D O I
10.1109/ACCESS.2025.3549680
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Evaluating synthetic data requires careful consideration of both utility and privacy. This study analyzes 12 synthesizers across 17 tabular health datasets, providing large-scale, comparable evaluation results. A novel utility-privacy score integrates a privacy measure into the evaluation, quantifying the trade-off between the two. Membership inference analysis is extended for robust privacy assessment, with reusable code provided for further research. Key findings include: 1) despite its simplicity, SMOTE achieves the best results of all synthesizers, but no single synthesizer consistently outperforms all others across all datasets; 2) utility and privacy are inherently correlated, with improvements in one compromising the other; 3) some synthesizers exhibit greater robustness to variations in data properties, such as sample and feature size, and some properties have a stronger impact than others. These findings underscore how closely the utility of a synthesizer is tied to individual datasets and privacy considerations, and highlight the importance of incorporating these aspects into future research and adopting broader, more diverse evaluation frameworks.
引用
收藏
页码:44497 / 44509
页数:13
相关论文
共 50 条
  • [21] Semantic Disclosure Control: semantics meets data privacy
    Batet, Montserrat
    Sanchez, David
    ONLINE INFORMATION REVIEW, 2018, 42 (03) : 290 - 303
  • [22] A systematic review of privacy-preserving techniques for synthetic tabular health data
    Tobias Hyrup
    Anton D. Lautrup
    Arthur Zimek
    Peter Schneider-Kamp
    Discover Data, 3 (1):
  • [23] Optimizing Privacy and Data Utility: Metrics and Strategies
    Mauger, Clemence
    Le Mahec, Gael
    Dequen, Gilles
    TRANSACTIONS ON DATA PRIVACY, 2023, 16 (03) : 153 - 189
  • [24] Utility of Privacy Preservation for Health Data Publishing
    Wu, Lengdong
    He, Hua
    Zaiane, Osmar R.
    2013 IEEE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2013, : 510 - 511
  • [25] On the Tradeoff Between Privacy and Utility in Data Publishing
    Li, Tiancheng
    Li, Ninghui
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 517 - 525
  • [26] Enhancing Utility and Privacy of Data for Software Testing
    Li, Boyang
    2014 SEVENTH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2014), 2014, : 233 - 234
  • [27] Utility and Privacy Assessment of Synthetic Microbiome Data
    Hittmeir, Markus
    Mayer, Rudolf
    Ekelhart, Andreas
    DATA AND APPLICATIONS SECURITY AND PRIVACY XXXVI, DBSEC 2022, 2022, 13383 : 15 - 27
  • [28] Empirical privacy and empirical utility of anonymized data
    Cormode, Graham
    Procopiuc, Cecilia M.
    Shen, Entong
    Srivastava, Divesh
    Yu, Ting
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2013, : 77 - 82
  • [29] Privacy preservation of the user data and properly balancing between privacy and utility
    Yuvaraj N.
    Praghash K.
    Karthikeyan T.
    International Journal of Business Intelligence and Data Mining, 2022, 20 (04): : 394 - 411
  • [30] Privacy Utility Tradeoff Between PETs: Differential Privacy and Synthetic Data
    Razi, Qaiser
    Datta, Sujoya
    Hassija, Vikas
    Chalapathi, G. S. S.
    Sikdar, Biplab
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 473 - 484