Differentially Private High-Dimensional Data Publication via Sampling-Based Inference

被引:119
|
作者
Chen, Rui [1 ,3 ]
Xiao, Qian [2 ]
Zhang, Yu [3 ]
Xu, Jianliang [3 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Hong Kong Baptist Univ, Hong Kong, Peoples R China
关键词
Differential privacy; high-dimensional data; joint distribution; dependency graph; junction tree algorithm; QUERIES;
D O I
10.1145/2783258.2783379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Releasing high-dimensional data enables a wide spectrum of data mining tasks. Yet, individual privacy has been a major obstacle to data sharing. In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. We propose a novel solution to preserve the joint distribution of a high-dimensional dataset. We first develop a robust sampling-based framework to systematically explore the dependencies among all attributes and subsequently build a dependency graph. This framework is coupled with a generic threshold mechanism to significantly improve accuracy. We then identify a set of marginal tables from the dependency graph to approximate the joint distribution based on the solid inference foundation of the junction tree algorithm while minimizing the resultant error. We prove that selecting the optimal marginals with the goal of minimizing error is NP-hard and, thus, design an approximation algorithm using an integer programming relaxation and the constrained concave-convex procedure. Extensive experiments on real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.
引用
收藏
页码:129 / 138
页数:10
相关论文
共 50 条
  • [41] High-Dimensional Knockoffs Inference for Time Series Data
    Chi, Chien-Ming
    Fan, Yingying
    Ing, Ching-Kang
    Lv, Jinchi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2025,
  • [42] PrivPfC: differentially private data publication for classification
    Dong Su
    Jianneng Cao
    Ninghui Li
    Min Lyu
    The VLDB Journal, 2018, 27 : 201 - 223
  • [43] Differentially Private Publication of Vertically Partitioned Data
    Tang, Peng
    Cheng, Xiang
    Su, Sen
    Chen, Rui
    Shao, Huaxi
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2021, 18 (02) : 780 - 795
  • [44] Towards Correlated Data Trading for High-Dimensional Private Data
    Cai, Hui
    Yang, Yuanyuan
    Fan, Weibei
    Xiao, Fu
    Zhu, Yanmin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (03) : 1047 - 1059
  • [45] Differentially Private Publication Scheme for Trajectory Data
    Li, Meng
    Zhu, Liehuang
    Zhang, Zijian
    Xu, Rixin
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 596 - 601
  • [46] Differentially private publication of streaming trajectory data
    Ding, Xiaofeng
    Zhou, Wenxiang
    Sheng, Shujun
    Bao, Zhifeng
    Choo, Kim-Kwang Raymond
    Jin, Hai
    INFORMATION SCIENCES, 2020, 538 : 159 - 175
  • [47] PrivPfC: differentially private data publication for classification
    Su, Dong
    Cao, Jianneng
    Li, Ninghui
    Lyu, Min
    VLDB JOURNAL, 2018, 27 (02): : 201 - 223
  • [48] DIFFERENTIALLY PRIVATE INFERENCE VIA NOISY OPTIMIZATION
    Avella-medina, Marco
    Bradshaw, Casey
    Loh, Po-ling
    ANNALS OF STATISTICS, 2023, 51 (05): : 2067 - 2092
  • [49] On landmark selection and sampling in high-dimensional data analysis
    Belabbas, Mohamed-Ali
    Wolfe, Patrick J.
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 367 (1906): : 4295 - 4312
  • [50] Locally Private High-Dimensional Crowdsourced Data Release Based on Copula Functions
    Wang, Teng
    Yang, Xinyu
    Ren, Xuebin
    Yu, Wei
    Yang, Shusen
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (02) : 778 - 792