Efficient multivariate data-oriented microaggregation

被引:0
|
作者
Josep Domingo-Ferrer
Antoni Martínez-Ballesté
Josep Maria Mateo-Sanz
Francesc Sebé
机构
[1] Rovira i Virgili University of Tarragona,Department of Computer Engineering & Maths
[2] Rovira i Virgili University of Tarragona,Statistics Group
来源
The VLDB Journal | 2006年 / 15卷
关键词
Statistical databases; Privacy; Anonymity; Statistical disclosure control; Microaggregation; Microdata protection;
D O I
暂无
中图分类号
学科分类号
摘要
Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released while preserving the privacy of the underlying individuals. The principle of microaggregation is to aggregate original database records into small groups prior to publication. Each group should contain at least k records to prevent disclosure of individual information, where k is a constant value preset by the data protector. Recently, microaggregation has been shown to be useful to achieve k-anonymity, in addition to it being a good masking method. Optimal microaggregation (with minimum within-groups variability loss) can be computed in polynomial time for univariate data. Unfortunately, for multivariate data it is an NP-hard problem. Several heuristic approaches to microaggregation have been proposed in the literature. Heuristics yielding groups with fixed size k tends to be more efficient, whereas data-oriented heuristics yielding variable group size tends to result in lower information loss. This paper presents new data-oriented heuristics which improve on the trade-off between computational complexity and information loss and are thus usable for large datasets.
引用
收藏
页码:355 / 369
页数:14
相关论文
共 50 条
  • [21] DUIF - A DATA-ORIENTED FLOWCHART ENVIRONMENT
    VANOOST, EMJC
    SIGPLAN NOTICES, 1983, 18 (02): : 69 - 75
  • [22] A data-oriented survey of context models
    Bolchini, Cristiana
    Curino, Carlo A.
    Quintarelli, Elisa
    Schreiber, Fabio A.
    Tanca, Letizia
    SIGMOD RECORD, 2007, 36 (04) : 19 - 26
  • [23] Data-Oriented Intelligent Transportation Systems
    Ibrahim, Hamdy
    Far, Behrouz H.
    2014 IEEE 15TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2014, : 322 - 329
  • [24] DATA-ORIENTED INCREMENTAL PROGRAMMING ENVIRONMENTS
    HENDERSON, PB
    LECTURE NOTES IN COMPUTER SCIENCE, 1987, 244 : 13 - 25
  • [25] A data-oriented (and beyond) network architecture
    Koponen, Teemu
    Chawla, Mohit
    Chun, Byung-Gon
    Ermolinskiy, Andrey
    Kim, Kye Hyun
    Shenker, Scott
    Stoica, Ion
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2007, 37 (04) : 181 - 192
  • [26] MultiAddIntent: Efficient RDF Data-Oriented Incremental Construction Concept Lattice Algorithm
    Yang, Liu
    Li, Guohui
    Xiao, Meihong
    Luo, Shuai
    Tan, Yangying
    Tang, Ziqiang
    Mu, Shuai
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 806 - 814
  • [27] Data-Oriented Downlink RSMA Systems
    Can, Mehmet
    Ilter, Mehmet C.
    Altunbas, Ibrahim
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (10) : 2812 - 2816
  • [28] Lazy Data-Oriented Evaluation Strategies
    Totoo, Prabhat
    Loidl, Hans-Wolfgang
    FHPC'14: PROCEEDINGS OF THE 2014 ACM SIGPLAN WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE COMPUTING, 2014, : 63 - 74
  • [29] Model selection with data-oriented penalty
    Bai, ZD
    Rao, CR
    Wu, Y
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1999, 77 (01) : 103 - 117
  • [30] Efficient k-anonymous microaggregation of multivariate numerical data via principal component analysis
    Rebollo Monedero, David
    Mohamad Mezher, Ahmad
    Casanova Colome, Xavier
    Forne, Jordi
    Soriano, Miguel
    INFORMATION SCIENCES, 2019, 503 : 417 - 443