Generalization-based privacy preservation and discrimination prevention in data publishing and mining

被引:35
|
作者
Hajian, Sara [1 ]
Domingo-Ferrer, Josep [1 ]
Farras, Oriol [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Engn & Maths, UNESCO Chair Data Privacy, E-43007 Tarragona, Spain
关键词
Data mining; Anti-discrimination; Privacy; Generalization; K-ANONYMITY;
D O I
10.1007/s10618-014-0346-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Living in the information society facilitates the automatic collection of huge amounts of data on individuals, organizations, etc. Publishing such data for secondary analysis (e.g. learning models and finding patterns) may be extremely useful to policy makers, planners, marketing analysts, researchers and others. Yet, data publishing and mining do not come without dangers, namely privacy invasion and also potential discrimination of the individuals whose data are published. Discrimination may ensue from training data mining models (e.g. classifiers) on data which are biased against certain protected groups (ethnicity, gender, political preferences, etc.). The objective of this paper is to describe how to obtain data sets for publication that are: (i) privacy-preserving; (ii) unbiased regarding discrimination; and (iii) as useful as possible for learning models and finding patterns. We present the first generalization-based approach to simultaneously offer privacy preservation and discrimination prevention. We formally define the problem, give an optimal algorithm to tackle it and evaluate the algorithm in terms of both general and specific data analysis metrics (i.e. various types of classifiers and rule induction algorithms). It turns out that the impact of our transformation on the quality of data is the same or only slightly higher than the impact of achieving just privacy preservation. In addition, we show how to extend our approach to different privacy models and anti-discrimination legal concepts.
引用
收藏
页码:1158 / 1188
页数:31
相关论文
共 50 条
  • [1] Generalization-based privacy preservation and discrimination prevention in data publishing and mining
    Sara Hajian
    Josep Domingo-Ferrer
    Oriol Farràs
    Data Mining and Knowledge Discovery, 2014, 28 : 1158 - 1188
  • [2] A Generalization-Based Approach for Personalized Privacy Preservation in Trajectory Data Publishing
    Komishani, Elahe Ghasemi
    Abadi, Mahdi
    2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 1129 - 1135
  • [3] Discrimination Prevention with Classification and Privacy Preservation in Data mining
    KumarTripathi, Krishna
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING AND VIRTUALIZATION (ICCCV) 2016, 2016, 79 : 244 - 253
  • [4] Generalization-based privacy-preserving data collection
    Zhang, Lijie
    Mang, Weining
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 115 - 124
  • [5] A Data Publishing System Based on Privacy Preservation
    Wang, Zhihui
    Zhu, Yun
    Zhou, Xuchen
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 553 - 556
  • [6] Evaluation of Geocoding Algorithms for Generalization-based Location Privacy
    Wightman, Pedro
    Sanmartin-Mendoza, Paul
    Salazar, Augusto
    2024 IEEE COLOMBIAN CONFERENCE ON COMMUNICATIONS AND COMPUTING, COLCOM 2024, 2024,
  • [7] Privacy Preservation for Trajectory Data Publishing by Look-Up Table Generalization
    Harnsamut, Nattapon
    Natwichai, Juggapong
    Riyana, Surapon
    DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : 15 - 27
  • [8] Fuzzy Set based Data Publishing For Privacy Preservation
    Xie, Meng-bo
    Qian, Quan
    2016 17TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2016, : 569 - 574
  • [9] A Generalization-Based POI Query Privacy Preserving Scheme
    Feng, Yunxia
    Li, Xu
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC, CONTROL AND AUTOMATION ENGINEERING (MECAE 2017), 2017, 61 : 33 - 38
  • [10] Child Health Dataset Publishing and Mining Based on Differential Privacy Preservation
    Li, Wenyu
    Wang, Siqi
    Wang, Hongwei
    Lu, Yunlong
    MATHEMATICS, 2024, 12 (16)