Generalization-based privacy preservation and discrimination prevention in data publishing and mining

被引:35
|
作者
Hajian, Sara [1 ]
Domingo-Ferrer, Josep [1 ]
Farras, Oriol [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Engn & Maths, UNESCO Chair Data Privacy, E-43007 Tarragona, Spain
关键词
Data mining; Anti-discrimination; Privacy; Generalization; K-ANONYMITY;
D O I
10.1007/s10618-014-0346-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Living in the information society facilitates the automatic collection of huge amounts of data on individuals, organizations, etc. Publishing such data for secondary analysis (e.g. learning models and finding patterns) may be extremely useful to policy makers, planners, marketing analysts, researchers and others. Yet, data publishing and mining do not come without dangers, namely privacy invasion and also potential discrimination of the individuals whose data are published. Discrimination may ensue from training data mining models (e.g. classifiers) on data which are biased against certain protected groups (ethnicity, gender, political preferences, etc.). The objective of this paper is to describe how to obtain data sets for publication that are: (i) privacy-preserving; (ii) unbiased regarding discrimination; and (iii) as useful as possible for learning models and finding patterns. We present the first generalization-based approach to simultaneously offer privacy preservation and discrimination prevention. We formally define the problem, give an optimal algorithm to tackle it and evaluate the algorithm in terms of both general and specific data analysis metrics (i.e. various types of classifiers and rule induction algorithms). It turns out that the impact of our transformation on the quality of data is the same or only slightly higher than the impact of achieving just privacy preservation. In addition, we show how to extend our approach to different privacy models and anti-discrimination legal concepts.
引用
收藏
页码:1158 / 1188
页数:31
相关论文
共 50 条
  • [11] Review on Privacy Preservation Method by Applying Discrimination Rules in Data Mining
    Meshram, Priya
    Bodkhe, Sonali
    2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,
  • [12] PRIVACY PRESERVATION FOR DISTANCE BASED DATA MINING IN DISTRIBUTED DATA
    Mtengwa, Rudo R.
    Mawuli, Cobbinah Bernard
    Kulevome, Delanyo
    Hailemichael, Mamo Tadiyos
    Agbley, Fortune
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [13] Generalization-based data mining in object-oriented databases using an object cube model
    Han, J
    Nishio, S
    Kawano, H
    Wang, W
    DATA & KNOWLEDGE ENGINEERING, 1998, 25 (1-2) : 55 - 97
  • [14] Utility of Privacy Preservation for Health Data Publishing
    Wu, Lengdong
    He, Hua
    Zaiane, Osmar R.
    2013 IEEE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2013, : 510 - 511
  • [15] Enhancing Privacy Preservation in Speech Data Publishing
    Zhang, Guanglin
    Ni, Sifan
    Zhao, Ping
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (08) : 7357 - 7367
  • [16] Preservation of Privacy in Publishing Social Network Data
    Wei, Qiong
    Lu, Yansheng
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, 2008, : 421 - 425
  • [17] An Enhanced Method for Privacy Preservation in Data Publishing
    Thomas, Christy
    Thomas, Diya
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [18] Composition and Generalization of Context Data for Privacy Preservation
    Pareschi, Linda
    Riboni, Daniele
    Agostini, Alessandra
    Bettini, Claudio
    2008 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS, 2008, : 429 - +
  • [19] Generalization step analysis for privacy preserving data publishing
    Lv P.
    Wu Y.
    International Journal of Digital Content Technology and its Applications, 2010, 4 (06) : 62 - 71
  • [20] GN: A Privacy Preserving Data Publishing Method Based on Generalization and Noise Techniques
    Ma, Yeling
    Wang, Jiyi
    Han, Jianmin
    Wang, Lixia
    2013 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2013, : 219 - 224