Privacy Protection Practice for Data Mining with Multiple Data Sources: An Example with Data Clustering

被引:2
|
作者
O'Shaughnessy, Pauline [1 ]
Lin, Yan-Xia [1 ]
机构
[1] Univ Wollongong, Sch Math & Appl Stat, Wollongong, NSW 2522, Australia
关键词
data masking; multiplicative noise; data mining; sample size calculation;
D O I
10.3390/math10244744
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In the age of data, data mining provides feasible tools with which to handle large datasets consisting of data from multiple sources. However, there is limited research on retrieving statistical information from data when data are confidential and cannot be shared directly. In this paper, we address this problem and propose a framework for performing data analysis using data from multiple sources without revealing true values for privacy purposes. The proposed framework includes three steps. First, data custodians individually mask data before publishing; then, the masked data collection is used to reconstruct the density function of the original dataset, from which resampled values are generated; last, existing data mining techniques are applied directly to the resampled data. This framework utilises the technique of reconstructing an original density function from noise-masked data using the moment-based density estimation method, which plays an essential role. Simulation studies show that the proposed framework performs well; analysis results from the resampled data are comparable to those of the original data when the density of the original data is estimated well. The proposed framework is demonstrated in data clustering analysis using the example of a real-life Australian soybean dataset. Results from the k-means algorithms with two and three fitted clusters are presented to show that cluster analysis using resampled data can well replicate that of the original data.
引用
收藏
页数:13
相关论文
共 50 条
  • [12] Data Protection and Privacy: Data Protection and Democracy
    Bougiakiotis, Emmanouil
    MODERN LAW REVIEW, 2022, 85 (02): : 566 - 570
  • [13] Data Protection and Privacy: Data Protection and Democracy
    Bougiakiotis, Emmanouil
    MODERN LAW REVIEW, 2021,
  • [14] Blockchain Data Privacy Protection Mechanism for Enterprise Finance and Data Mining Algorithms
    Ma, Xuejun
    Zhang, Yongshan
    Engineering Intelligent Systems, 32 (05): : 435 - 443
  • [15] Data privacy protection in multi-party clustering
    Yang, Weijia
    Huang, Shangteng
    DATA & KNOWLEDGE ENGINEERING, 2008, 67 (01) : 185 - 199
  • [16] Differential Privacy Data Protection Method Based on Clustering
    Li Li-xin
    Ding Yong-shan
    Wang Jia-yan
    2017 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2017, : 11 - 16
  • [17] Using cryptography for privacy protection in data mining systems
    Zhan, Justin
    WEB INTELLIGENCE MEETS BRAIN INFORMATICS, 2007, 4845 : 494 - +
  • [18] Designing of Privacy Protection Platform Based on Data Mining
    Zhou Bing
    Zeng Zhihua
    PROCEEDINGS OF THE 2015 INTERNATIONAL INDUSTRIAL INFORMATICS AND COMPUTER ENGINEERING CONFERENCE, 2015, : 1251 - 1254
  • [19] A privacy data protection algorithm for mining association rules
    Zhu, Yuquan
    Sun, Chao
    Chen, Geng
    Journal of Computational Information Systems, 2010, 6 (10): : 3345 - 3352
  • [20] Privacy in Data Mining
    Josep Domingo-Ferrer
    Vicenç Torra
    Data Mining and Knowledge Discovery, 2005, 11 : 117 - 119