Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

被引:2
|
作者
Chu, Zhiguang [1 ,2 ]
He, Jingsha [1 ]
Zhang, Xiaolei [2 ]
Zhang, Xing [2 ]
Zhu, Nafei [1 ]
机构
[1] Beijing Univ Technol, Sch Software Engn, Beijing 100124, Peoples R China
[2] Key Lab Secur Network & Data Ind Internet Liaoning, Jinzhou 121000, Peoples R China
关键词
high-dimensional data; feature selection; random forest; clustering; differential privacy; PREDICTION; ALGORITHM;
D O I
10.3390/electronics12091959
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI's Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [32] Neighborhood Component Feature Selection for High-Dimensional Data
    Yang, Wei
    Wang, Kuanquan
    Zuo, Wangmeng
    JOURNAL OF COMPUTERS, 2012, 7 (01) : 161 - 168
  • [33] Efficient feature selection filters for high-dimensional data
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1794 - 1804
  • [34] On the scalability of feature selection methods on high-dimensional data
    V. Bolón-Canedo
    D. Rego-Fernández
    D. Peteiro-Barral
    A. Alonso-Betanzos
    B. Guijarro-Berdiñas
    N. Sánchez-Maroño
    Knowledge and Information Systems, 2018, 56 : 395 - 442
  • [35] Simultaneous Feature and Model Selection for High-Dimensional Data
    Perolini, Alessandro
    Guerif, Sebastien
    2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 47 - 50
  • [36] Simultaneous Feature Selection and Classification for High-Dimensional Data
    Pai, Vriddhi
    Gupta, Subhash Chand
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 153 - 158
  • [37] High-Dimensional Software Engineering Data and Feature Selection
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Seliya, Naeem
    ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 83 - +
  • [38] Feature Selection for High-Dimensional Data: The Issue of Stability
    Pes, Barbara
    2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 170 - 175
  • [39] Hybrid Feature Selection for High-Dimensional Manufacturing Data
    Sun, Yajuan
    Yu, Jianlin
    Li, Xiang
    Wu, Ji Yan
    Lu, Wen Feng
    2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [40] A hybrid feature selection method for high-dimensional data
    Taheri, Nooshin
    Nezamabadi-pour, Hossein
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 141 - 145