Differentially Private k-Nearest Neighbor Missing Data Imputation

被引:4
|
作者
Clifton, Chris [1 ]
Hanson, Eric J. [2 ]
Merrill, Keith [3 ]
Merrill, Shawn [1 ]
机构
[1] Purdue Univ, 305 N Univ St, W Lafayette, IN 47906 USA
[2] Univ Quebec Montreal, Lab Combinatoire & Informat Math, Montreal, PQ H3C 3P8, Canada
[3] Brandeis Univ, 415 South St, Waltham, MA 02453 USA
关键词
Differential privacy; statistical disclosure limitation; private data cleaning; smooth sensitivity;
D O I
10.1145/3507952
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using techniques employing smooth sensitivity, we develop a method for k-nearest neighbor missing data imputation with differential privacy. This requires bounding the number of data incomplete tuples that can have their data complete "donor" changed by making a single addition or deletion to the dataset. The multiplicity of a single individual's impact on an imputed dataset necessarily means our mechanisms require the addition of more noise than mechanisms that ignore missing data, but we show empirically that this is significantly outweighed by the bias reduction from imputing missing data.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Differentially private nearest neighbor classification
    Gursoy, Mehmet Emre
    Inan, Ali
    Nergiz, Mehmet Ercan
    Saygin, Yucel
    DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (05) : 1544 - 1575
  • [22] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
    Jonsson, Per
    Wohlin, Claes
    EMPIRICAL SOFTWARE ENGINEERING, 2006, 11 (03) : 463 - 489
  • [23] An evaluation of k-nearest neighbour imputation using Likert data
    Jönsson, P
    Wohlin, C
    10TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS, 2004, : 108 - 118
  • [24] Grey Relational Analysis based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets
    Huang, Jianglin
    Sun, Hongyi
    2016 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2016), 2016, : 86 - 91
  • [25] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
    Per Jönsson
    Claes Wohlin
    Empirical Software Engineering, 2006, 11
  • [26] Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods
    Saeipourdizaj, Parisa
    Sarbakhsh, Parvin
    Gholampour, Akbar
    ENVIRONMENTAL HEALTH ENGINEERING AND MANAGEMENT JOURNAL, 2021, 8 (03): : 215 - 226
  • [27] k-nearest neighbor imputation method and its application in fault diagnosis of industrial process
    Li, Yuan
    Wu, Jie
    Wang, Guo-Zhu
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2015, 49 (06): : 830 - 836
  • [28] Scalable Evidential K-Nearest Neighbor Classification on Big Data
    Gong, Chaoyu
    Demmel, Jim
    You, Yang
    IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 226 - 237
  • [29] MKNN: Modified K-Nearest Neighbor
    Parvin, Hamid
    Alizadeh, Hoscin
    Minael-Bidgoli, Behrouz
    WCECS 2008: WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, 2008, : 831 - 834
  • [30] A GENERALIZED K-NEAREST NEIGHBOR RULE
    PATRICK, EA
    FISCHER, FP
    INFORMATION AND CONTROL, 1970, 16 (02): : 128 - &