Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

被引:2
|
作者
Chu, Zhiguang [1 ,2 ]
He, Jingsha [1 ]
Zhang, Xiaolei [2 ]
Zhang, Xing [2 ]
Zhu, Nafei [1 ]
机构
[1] Beijing Univ Technol, Sch Software Engn, Beijing 100124, Peoples R China
[2] Key Lab Secur Network & Data Ind Internet Liaoning, Jinzhou 121000, Peoples R China
关键词
high-dimensional data; feature selection; random forest; clustering; differential privacy; PREDICTION; ALGORITHM;
D O I
10.3390/electronics12091959
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI's Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [42] A hybrid feature selection scheme for high-dimensional data
    Ganjei, Mohammad Ahmadi
    Boostani, Reza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113
  • [43] Evaluating Feature Selection Robustness on High-Dimensional Data
    Pes, Barbara
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 235 - 247
  • [44] Feature selection for classifying high-dimensional numerical data
    Wu, YM
    Zhang, AD
    PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, 2004, : 251 - 258
  • [45] Dynamic Edge-Based High-Dimensional Data Aggregation with Differential Privacy
    Chen, Qian
    Ni, Zhiwei
    Zhu, Xuhui
    Lyu, Moli
    Liu, Wentao
    Xia, Pingfan
    ELECTRONICS, 2024, 13 (16)
  • [46] Global Combination and Clustering Based Differential Privacy Mixed Data Publishing
    Chen, Lanxiang
    Zeng, Lingfang
    Mu, Yi
    Chen, Leilei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11437 - 11448
  • [47] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Fanyu Bu
    Zhikui Chen
    Qingchen Zhang
    Laurence T. Yang
    The Journal of Supercomputing, 2016, 72 : 2977 - 2990
  • [48] A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering
    Boutemedjet, Sabri
    Bouguila, Nizar
    Ziou, Djemel
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (08) : 1429 - 1443
  • [49] A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data
    Nuha Zamzami
    Nizar Bouguila
    Pattern Analysis and Applications, 2023, 26 : 91 - 106
  • [50] A Clustering Algorithm for High-Dimensional Nonlinear Feature Data with Applications
    Jiang H.
    Wang G.
    Gao J.
    Gao Z.
    Gao R.
    Guo Q.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (12): : 49 - 55and90