Key-value data collection and statistical analysis with local differential privacy

被引:1
|
作者
Zhu, Hui [1 ]
Tang, Xiaohu [1 ]
Yang, Laurence Tianruo [2 ,3 ,4 ]
Fu, Chao [5 ]
Peng, Shuangrong [1 ]
机构
[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R China
[2] Hainan Univ, Sch Comp Sci & Technol, Haikou, Peoples R China
[3] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS, Canada
[4] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
[5] Southwest Jiaotong Univ, Sch Math, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Key-value data; Local differential privacy; Mean estimation; Frequency estimation; RANGE QUERIES;
D O I
10.1016/j.ins.2023.119058
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The collection and statistical analysis of simple data types (e.g., categorical, numerical and multi-dimensional data) under local differential privacy has been widely studied. Recently, researchers have focused on the collection of the key-value data, which is one of the main types of NoSQL data model. In the collection and statistical analysis of key-value data under local differential privacy, the frequency and mean of each key must be estimated simultaneously. However, achieving a good utility-privacy tradeoff is difficult, because key-value data has inherent correlation, and some users may have different numbers of key-value pairs. In this paper, we propose an efficient sampling based scheme for collecting and analyzing key-value data. Note that the more valid data collected, the higher the accuracy of statistical data under the same disturbance level and disturbance algorithm. Therefore, we make full use of probability sampling and the inherent correlation of key-value data to improve the probability of users submitting valid key-value data. Moreover, we optimize the budget allocation on key-value data, so that the overall variance of frequency and mean estimation is close to optimal. Detailed theoretical analysis and experimental results show that the proposed scheme is superior to existing schemes in accuracy.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] A Key-Value based Application Platform for Enterprise Big Data
    Hu, Bo
    Ma, Yutao
    Zhang, Liang-Jie
    Shi, Jiake
    Zhong, Jiayan
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 446 - 453
  • [42] Workload-Aware Indoor Positioning Data Collection via Local Differential Privacy
    Kim, Jong Wook
    Jang, Beakcheol
    IEEE COMMUNICATIONS LETTERS, 2019, 23 (08) : 1352 - 1356
  • [43] Set-valued data collection with local differential privacy based on category hierarchy
    Ouyang, Jia
    Xiao, Yinyin
    Liu, Shaopeng
    Xiao, Zhenghong
    Liao, Xiuxiu
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (03) : 2733 - 2763
  • [44] Evaluation and Analysis of In-Memory Key-Value Systems
    Cao, Wenqi
    Sahin, Semih
    Liu, Ling
    Bao, Xianqiang
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 26 - 33
  • [45] Secure Medical Data Collection in the Internet of Medical Things Based on Local Differential Privacy
    Wang, Jinpeng
    Li, Xiaohui
    ELECTRONICS, 2023, 12 (02)
  • [46] Collaborative Sampling for Partial Multi-Dimensional Value Collection Under Local Differential Privacy
    Qian, Qiuyu
    Ye, Qingqing
    Hu, Haibo
    Huang, Kai
    Chan, Tom Tak-Lam
    Li, Jin
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 3948 - 3961
  • [47] Key-value based data hiding method for NoSQL database
    Ta Minh Thanh
    Nguyen Huu Thuy
    Ngoc-Tu Huynh
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2018, : 193 - 197
  • [48] Hybrid Data Reliability for Emerging Key-Value Storage Devices
    Pitchumani, Rekha
    Kee, Yang-Suk
    PROCEEDINGS OF THE 18TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2020, : 309 - 322
  • [49] GeoWave: Utilizing Distributed Key-Value Stores for Multidimensional Data
    Whitby, Michael A.
    Fecher, Rich
    Bennight, Chris
    ADVANCES IN SPATIAL AND TEMPORAL DATABASES, SSTD 2017, 2017, 10411 : 105 - 122
  • [50] Crowdsourced Data Integrity Verification for Key-Value Stores in the Cloud
    Weintraub, Grisha
    Gudes, Ehud
    2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 498 - 503