Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering

被引：2

作者：

Chu, Zhiguang ^{[1
,2
]}

He, Jingsha ^{[1
]}

Zhang, Xiaolei ^{[2
]}

Zhang, Xing ^{[2
]}

Zhu, Nafei ^{[1
]}

机构：

[1] Beijing Univ Technol, Sch Software Engn, Beijing 100124, Peoples R China

[2] Key Lab Secur Network & Data Ind Internet Liaoning, Jinzhou 121000, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 09期

关键词：

high-dimensional data; feature selection; random forest; clustering; differential privacy; PREDICTION; ALGORITHM;

D O I：

10.3390/electronics12091959

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI's Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results.

引用

页数：16

共 50 条

[21] Feature selection for high-dimensional imbalanced data
Yin, Liuzhi
Ge, Yong
Xiao, Keli
Wang, Xuehua
Quan, Xiaojun
NEUROCOMPUTING, 2013, 105 : 3 - 11
[22] A filter feature selection for high-dimensional data
Janane, Fatima Zahra
Ouaderhman, Tayeb
Chamlal, Hasna
JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
[23] Feature selection for high-dimensional temporal data
Michail Tsagris
Vincenzo Lagani
Ioannis Tsamardinos
BMC Bioinformatics, 19
[24] Feature Selection with High-Dimensional Imbalanced Data
Van Hulse, Jason
Khoshgoftaar, Taghi M.
Napolitano, Amri
Wald, Randall
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
[25] Feature selection for high-dimensional temporal data
Tsagris, Michail
Lagani, Vincenzo
Tsamardinos, Ioannis
BMC BIOINFORMATICS, 2018, 19
[26] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
Verleysen, Michel
ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
[27] Feature Selection for Clustering on High Dimensional Data
Zeng, Hong
Cheung, Yiu-ming
PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 913 - 922
[28] Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data
Binh Tran
Xue, Bing
Zhang, Mengjie
GENETIC PROGRAMMING, EUROGP 2017, 2017, 10196 : 210 - 226
[29] High-dimensional data clustering using k-means subspace feature selection
Wang, Xiao-Dong
Chen, Rung-Ching
Yan, Fei
Journal of Network Intelligence, 2019, 4 (03): : 80 - 87
[30] PU_Bpub: High-Dimensional Data Release Mechanism Based on Spectral Clustering with Local Differential Privacy
Lin, Aixin
Ma, Xuebin
WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2022), PT II, 2022, 13472 : 572 - 581

← 1 2 3 4 5 →