A data-driven approach to choosing privacy parameters for clinical trial data sharing under differential privacy

被引:0
|
作者
Chen, Henian [1 ,5 ]
Pang, Jinyong [1 ]
Zhao, Yayi [1 ]
Giddens, Spencer [2 ]
Ficek, Joseph [3 ]
Valente, Matthew J. [1 ]
Cao, Biwei [1 ]
Daley, Ellen [4 ]
机构
[1] Univ S Florida, Coll Publ Hlth, Study Design & Data Anal, Tampa, FL 33612 USA
[2] Univ Notre Dame, Dept Appl & Computat Math & Stat, Notre Dame, IN 46556 USA
[3] GlaxoSmithKline, Oncol Stat, Collegeville, PA 19426 USA
[4] Univ S Florida, Coll Publ Hlth, Lawton & Rhea Chiles Ctr Children & Families, Tampa, FL USA
[5] Univ S Florida, Coll Publ Hlth, Study Design & Data Anal, 13201 Bruce B Downs Blvd, MDC 56, Tampa, FL 33612 USA
关键词
clinical trial; differential privacy; accuracy; data sharing; privacy parameter; RELATION EXTRACTION;
D O I
10.1093/jamia/ocae038
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives Clinical trial data sharing is crucial for promoting transparency and collaborative efforts in medical research. Differential privacy (DP) is a formal statistical technique for anonymizing shared data that balances privacy of individual records and accuracy of replicated results through a "privacy budget" parameter, epsilon. DP is considered the state of the art in privacy-protected data publication and is underutilized in clinical trial data sharing. This study is focused on identifying epsilon values for the sharing of clinical trial data. Materials and Methods We analyzed 2 clinical trial datasets with privacy budget epsilon ranging from 0.01 to 10. Smaller values of epsilon entail adding greater amounts of random noise, with better privacy as a result. Comparison of rates, odds ratios, means, and mean differences between the original clinical trial datasets and the empirical distribution of the DP estimator was performed. Results The DP rate closely approximated the original rate of 6.5% when epsilon > 1. The DP odds ratio closely aligned with the original odds ratio of 0.689 when epsilon >= 3. The DP mean closely approximated the original mean of 164.64 when epsilon >= 1. As epsilon increased to 5, both the minimum and maximum DP means converged toward the original mean. Discussion There is no consensus on how to choose the privacy budget epsilon. The definition of DP does not specify the required level of privacy, and there is no established formula for determining epsilon. Conclusion Our findings suggest that the application of DP holds promise in the context of sharing clinical trial data.
引用
收藏
页码:1135 / 1143
页数:9
相关论文
共 50 条
  • [21] Data-Driven Envelopment with Privacy-Policy Tying
    Condorelli, Daniele
    Padilla, Jorge
    ECONOMIC JOURNAL, 2024, 134 (658): : 515 - 536
  • [22] Privacy Protection for Data-Driven Smart Manufacturing Systems
    Wong, Kok-Seng
    Kim, Myung Ho
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2017, 14 (03) : 17 - 32
  • [23] Automated data-driven profiling: threats for group privacy
    Mavriki, Paola
    Karyda, Maria
    INFORMATION AND COMPUTER SECURITY, 2020, 28 (02) : 183 - 197
  • [24] Privacy Preserving for Switched Systems Under Robust Data-Driven Predictive Control
    Qi, Yiwen
    Guo, Shitong
    Chi, Ronghu
    Tang, Yiwen
    Qu, Ziyu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025, 55 (01): : 480 - 490
  • [25] Genetic Informaiton Privacy in the Age of Data-Driven Medicine
    Li, Jingquan
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 299 - 306
  • [26] Data sharing threatens privacy
    Declan Butler
    Nature, 2007, 449 : 644 - 644
  • [27] Sharing data - protecting privacy
    不详
    R&D MAGAZINE, 2006, 48 (06): : 14 - 14
  • [28] Data sharing threatens privacy
    Butler, Declan
    NATURE, 2007, 449 (7163) : 644 - 645
  • [29] Genetic Data Sharing and Privacy
    Sorani, Marco D.
    Yue, John K.
    Sharma, Sourabh
    Manley, Geoffrey T.
    Ferguson, Adam R.
    Cooper, Shelly R.
    Dams-O'Connor, Kristen
    Gordon, Wayne A.
    Lingsma, Hester F.
    Maas, Andrew I. R.
    Menon, David K.
    Morabito, Diane J.
    Mukherjee, Pratik
    Okonkwo, David O.
    Puccio, Ava M.
    Valadka, Alex B.
    Yuh, Esther L.
    NEUROINFORMATICS, 2015, 13 (01) : 1 - 6
  • [30] Genetic Data Sharing and Privacy
    Marco D. Sorani
    John K. Yue
    Sourabh Sharma
    Geoffrey T. Manley
    Adam R. Ferguson
    Shelly R. Cooper
    Kristen Dams-O’Connor
    Wayne A. Gordon
    Hester F. Lingsma
    Andrew I. R. Maas
    David K. Menon
    Diane J. Morabito
    Pratik Mukherjee
    David O. Okonkwo
    Ava M. Puccio
    Alex B. Valadka
    Esther L. Yuh
    Neuroinformatics, 2015, 13 : 1 - 6