Generalization-based privacy preservation and discrimination prevention in data publishing and mining

被引:35
|
作者
Hajian, Sara [1 ]
Domingo-Ferrer, Josep [1 ]
Farras, Oriol [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Engn & Maths, UNESCO Chair Data Privacy, E-43007 Tarragona, Spain
关键词
Data mining; Anti-discrimination; Privacy; Generalization; K-ANONYMITY;
D O I
10.1007/s10618-014-0346-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Living in the information society facilitates the automatic collection of huge amounts of data on individuals, organizations, etc. Publishing such data for secondary analysis (e.g. learning models and finding patterns) may be extremely useful to policy makers, planners, marketing analysts, researchers and others. Yet, data publishing and mining do not come without dangers, namely privacy invasion and also potential discrimination of the individuals whose data are published. Discrimination may ensue from training data mining models (e.g. classifiers) on data which are biased against certain protected groups (ethnicity, gender, political preferences, etc.). The objective of this paper is to describe how to obtain data sets for publication that are: (i) privacy-preserving; (ii) unbiased regarding discrimination; and (iii) as useful as possible for learning models and finding patterns. We present the first generalization-based approach to simultaneously offer privacy preservation and discrimination prevention. We formally define the problem, give an optimal algorithm to tackle it and evaluate the algorithm in terms of both general and specific data analysis metrics (i.e. various types of classifiers and rule induction algorithms). It turns out that the impact of our transformation on the quality of data is the same or only slightly higher than the impact of achieving just privacy preservation. In addition, we show how to extend our approach to different privacy models and anti-discrimination legal concepts.
引用
收藏
页码:1158 / 1188
页数:31
相关论文
共 50 条
  • [31] Privacy-Preserving Data Publishing in Process Mining
    Rafiei, Majid
    van der Aalst, Wil M. P.
    BUSINESS PROCESS MANAGEMENT FORUM, BPM FORUM 2020, 2020, 392 : 122 - 138
  • [32] An efficient privacy-preservation algorithm for incremental data publishing
    Soontornphand, Torsak
    Iwaihara, Mizuho
    Natwichai, Juggapong
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2023, 14 (06) : 562 - 582
  • [33] Sensitive attribute privacy preservation of trajectory data publishing based on l-diversity
    Lin Yao
    Zhenyu Chen
    Haibo Hu
    Guowei Wu
    Bin Wu
    Distributed and Parallel Databases, 2021, 39 : 785 - 811
  • [34] Sensitive attribute privacy preservation of trajectory data publishing based on l-diversity
    Yao, Lin
    Chen, Zhenyu
    Hu, Haibo
    Wu, Guowei
    Wu, Bin
    DISTRIBUTED AND PARALLEL DATABASES, 2021, 39 (03) : 785 - 811
  • [35] A privacy protection technique for publishing data mining models and research data
    Fu Y.
    Chen Z.
    Koru G.
    Gangopadhyay A.
    ACM Transactions on Management Information Systems, 2010, 1 (01)
  • [36] Efficient privacy preservation of big data for accurate data mining
    Chamikara, M. A. P.
    Bertok, P.
    Liu, D.
    Camtepe, S.
    Khalil, I
    INFORMATION SCIENCES, 2020, 527 : 420 - 443
  • [37] Multi-Attribute Generalization Method in Privacy Preserving Data Publishing
    Yu Wen-bing
    Pin, L. V.
    Chen Nian-sheng
    2010 2ND INTERNATIONAL CONFERENCE ON E-BUSINESS AND INFORMATION SYSTEM SECURITY (EBISS 2010), 2010, : 319 - 322
  • [38] vA SLICING WITH GENERALIZATION TECHNIQUES USED FOR PRIVACY PRESERVING DATA PUBLISHING
    Kumar, B. Santhosh
    Karthik, S.
    Arunachalam, V. P.
    IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORK SECURITY (ICSNS 2018), 2018, : 322 - 328
  • [39] Utility-Friendly Heterogenous Generalization in Privacy Preserving Data Publishing
    He, Xianmang
    Li, Dong
    Hao, Yanni
    Chen, Huahui
    CONCEPTUAL MODELING, 2014, 8824 : 186 - 194
  • [40] Privacy Preservation Algorithm in Data Mining for CRM Systems
    Virupaksha, Shashidhar
    Sahoo, G.
    Vasudevan, Ananthasayanam
    2014 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2014,