Generalization-based privacy preservation and discrimination prevention in data publishing and mining

被引:35
|
作者
Hajian, Sara [1 ]
Domingo-Ferrer, Josep [1 ]
Farras, Oriol [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Engn & Maths, UNESCO Chair Data Privacy, E-43007 Tarragona, Spain
关键词
Data mining; Anti-discrimination; Privacy; Generalization; K-ANONYMITY;
D O I
10.1007/s10618-014-0346-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Living in the information society facilitates the automatic collection of huge amounts of data on individuals, organizations, etc. Publishing such data for secondary analysis (e.g. learning models and finding patterns) may be extremely useful to policy makers, planners, marketing analysts, researchers and others. Yet, data publishing and mining do not come without dangers, namely privacy invasion and also potential discrimination of the individuals whose data are published. Discrimination may ensue from training data mining models (e.g. classifiers) on data which are biased against certain protected groups (ethnicity, gender, political preferences, etc.). The objective of this paper is to describe how to obtain data sets for publication that are: (i) privacy-preserving; (ii) unbiased regarding discrimination; and (iii) as useful as possible for learning models and finding patterns. We present the first generalization-based approach to simultaneously offer privacy preservation and discrimination prevention. We formally define the problem, give an optimal algorithm to tackle it and evaluate the algorithm in terms of both general and specific data analysis metrics (i.e. various types of classifiers and rule induction algorithms). It turns out that the impact of our transformation on the quality of data is the same or only slightly higher than the impact of achieving just privacy preservation. In addition, we show how to extend our approach to different privacy models and anti-discrimination legal concepts.
引用
收藏
页码:1158 / 1188
页数:31
相关论文
共 50 条
  • [41] P-IRON for Privacy Preservation in Data Mining
    Arumugam, G.
    Sulekha, V. Jane Varamani
    KNOWLEDGE MANAGEMENT IN ORGANIZATIONS (KMO 2017), 2017, 731 : 410 - 423
  • [42] An Intensified Approach for Privacy Preservation in Incremental Data Mining
    Rajalakshmi, V.
    Mala, G. S. Anandha
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 3, 2013, 178 : 347 - +
  • [43] A Comprehensive Survey on Privacy Preservation Algorithms in Data Mining
    Kiran, Ajmeera
    Vasumathi, D.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 1060 - 1066
  • [44] Privacy Risks and Countermeasures In Publishing and Mining Social Network Data
    Watanabe, Chiemi
    Amagasa, Toshiyuki
    Liu, Ling
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2011, : 55 - 66
  • [45] Optimized key generation-based privacy preserving data mining model for secure data publishing
    Kulkarni, Yogesh R.
    Jagdale, Balaso
    Sugave, Shounak R.
    ADVANCES IN ENGINEERING SOFTWARE, 2023, 175
  • [46] Privacy preservation for attribute order sensitive workload in medical data publishing
    Gao, Ai-Qiang
    Diao, Lu-Hong
    Ruan Jian Xue Bao/Journal of Software, 2009, 20 (SUPPL. 1): : 314 - 320
  • [47] Clustering-anonymity method for data-publishing privacy preservation
    Jiang Huowen
    PROCEEDINGS OF THE 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER ENGINEERING AND ELECTRONICS (ICECEE 2015), 2015, 24 : 34 - 37
  • [48] Privacy Preservation for Attribute Order Sensitive Workload in Medical Data Publishing
    Gao Ai-qiang
    Diao Lu-hong
    2009 IEEE INTERNATIONAL SYMPOSIUM ON IT IN MEDICINE & EDUCATION, VOLS 1 AND 2, PROCEEDINGS, 2009, : 1140 - +
  • [49] Generalization-Based Acquisition of Training Data for Motor Primitive Learning by Neural Networks
    Loncarevic, Zvezdan
    Pahic, Rok
    Ude, Ales
    Gams, Andrej
    APPLIED SCIENCES-BASEL, 2021, 11 (03): : 1 - 17
  • [50] Mining association rules from distorted data for privacy preservation
    Zhang, P
    Tong, YH
    Tang, SW
    Yang, DQ
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2005, 3683 : 1345 - 1351