Differentially Private High-Dimensional Data Publication via Sampling-Based Inference

被引:119
|
作者
Chen, Rui [1 ,3 ]
Xiao, Qian [2 ]
Zhang, Yu [3 ]
Xu, Jianliang [3 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Hong Kong Baptist Univ, Hong Kong, Peoples R China
关键词
Differential privacy; high-dimensional data; joint distribution; dependency graph; junction tree algorithm; QUERIES;
D O I
10.1145/2783258.2783379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Releasing high-dimensional data enables a wide spectrum of data mining tasks. Yet, individual privacy has been a major obstacle to data sharing. In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. We propose a novel solution to preserve the joint distribution of a high-dimensional dataset. We first develop a robust sampling-based framework to systematically explore the dependencies among all attributes and subsequently build a dependency graph. This framework is coupled with a generic threshold mechanism to significantly improve accuracy. We then identify a set of marginal tables from the dependency graph to approximate the joint distribution based on the solid inference foundation of the junction tree algorithm while minimizing the resultant error. We prove that selecting the optimal marginals with the goal of minimizing error is NP-hard and, thus, design an approximation algorithm using an integer programming relaxation and the constrained concave-convex procedure. Extensive experiments on real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.
引用
收藏
页码:129 / 138
页数:10
相关论文
共 50 条
  • [31] Quantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning
    Plaku, Erion
    Kavraki, Lydia E.
    ALGORITHMIC FOUNDATION OF ROBOTICS VII, 2008, 47 : 3 - 18
  • [32] PrivTDSI: A Local Differentially Private Approach for Truth Discovery via Sampling and Inference
    Zhang, Pengfei
    Cheng, Xiang
    Su, Sen
    Zhu, Binyuan
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (02) : 471 - 484
  • [33] Differentially private geospatial data publication based on grid clustering
    Yang, Dongni
    Li, Songyan
    Liu, Zhaobin
    Ye, Xinfeng
    INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2019, 11 (05) : 613 - 623
  • [34] Locally Differentially Private Frequent Pattern Mining for High-Dimensional Data in Mobile Smart Services
    Li, Qi
    Peng, Shunshun
    Wu, Haonan
    Ran, Ruisheng
    Li, Yong
    Zhou, Mingliang
    Guo, Taolin
    Mao, Qin
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (15)
  • [35] Near-Optimal Thompson Sampling-based Algorithms for Differentially Private Stochastic Bandits
    Hu, Bingshan
    Hegde, Nidhi
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 844 - +
  • [36] Cross-Dimensional Inference of Dependent High-Dimensional Data
    Desai, Keyur H.
    Storey, John D.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (497) : 135 - 151
  • [37] Superpopulation model inference for non probability samples under informative sampling with high-dimensional data
    Liu, Zhan
    Wang, Dianni
    Pan, Yingli
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2025, 54 (05) : 1370 - 1390
  • [38] High-dimensional rank-based inference
    Kong, Xiaoli
    Harrar, Solomon W.
    JOURNAL OF NONPARAMETRIC STATISTICS, 2020, 32 (02) : 294 - 322
  • [39] PROGRAM EVALUATION AND CAUSAL INFERENCE WITH HIGH-DIMENSIONAL DATA
    Belloni, A.
    Chernozhukov, V.
    Fernandez-Val, I.
    Hansen, C.
    ECONOMETRICA, 2017, 85 (01) : 233 - 298
  • [40] A Sampling-Based Method for Highly Efficient Privacy-Preserving Data Publication
    Lu, Guoming
    Zheng, Xu
    Duan, Jingyuan
    Tian, Ling
    Wang, Xia
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021