Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

被引:2
|
作者
Chu, Zhiguang [1 ,2 ]
He, Jingsha [1 ]
Zhang, Xiaolei [2 ]
Zhang, Xing [2 ]
Zhu, Nafei [1 ]
机构
[1] Beijing Univ Technol, Sch Software Engn, Beijing 100124, Peoples R China
[2] Key Lab Secur Network & Data Ind Internet Liaoning, Jinzhou 121000, Peoples R China
关键词
high-dimensional data; feature selection; random forest; clustering; differential privacy; PREDICTION; ALGORITHM;
D O I
10.3390/electronics12091959
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI's Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Enhancing protection in high-dimensional data: Distributed differential privacy with feature selection
    Putrama, I. Made
    Martinek, Peter
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (06)
  • [2] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [3] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [4] LoHDP: Adaptive local differential privacy for high-dimensional data publishing
    Shen, Guohua
    Cai, Mengnan
    Huang, Zhiqiu
    Yang, Yang
    Guo, Feifei
    Wei, Linlin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (11):
  • [5] A High-Dimensional Data Trust Publishing Method Based on Attention Mechanism and Differential Privacy
    Li, Taiqiang
    Zhang, Zhen
    Qian, Heng
    Wang, Qiuyue
    Su, Guanqun
    Meng, Lingzhen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 208 - 219
  • [6] A density-based clustering algorithm for high-dimensional data with feature selection
    Qi Xianting
    Wang Pan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
  • [7] On online high-dimensional spherical data clustering and feature selection
    Amayri, Ola
    Bouguila, Nizar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1386 - 1398
  • [8] Multi-Party High-Dimensional Data Publishing Under Differential Privacy
    Cheng, Xiang
    Tang, Peng
    Su, Sen
    Chen, Rui
    Wu, Zequn
    Zhu, Binyuan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (08) : 1557 - 1571
  • [9] A differential evolution based feature combination selection algorithm for high-dimensional data
    Guan, Boxin
    Zhao, Yuhai
    Yin, Ying
    Li, Yuan
    INFORMATION SCIENCES, 2021, 547 : 870 - 886
  • [10] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14