Tailoring Data Source Distributions for Fairness-aware Data Integration

被引:16
|
作者
Nargesian, Fatemeh [1 ]
Asudeh, Abolfazl [2 ]
Jagadish, H., V [3 ]
机构
[1] Univ Rochester, Rochester, MN 55905 USA
[2] Univ Illinois, Chicago, IL USA
[3] Univ Michigan, Ann Arbor, MI 48109 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2021年 / 14卷 / 11期
基金
美国国家科学基金会;
关键词
D O I
10.14778/3476249.3476299
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data scientists often develop data sets for analysis by drawing upon sources of data available to them. A major challenge is to ensure that the data set used for analysis has an appropriate representation of relevant (demographic) groups: it meets desired distribution requirements. Whether data is collected through some experiment or obtained from some data provider, the data from any single source may not meet the desired distribution requirements. Therefore, a union of data from multiple sources is often required. In this paper, we study how to acquire such data in the most cost effective manner, for typical cost functions observed in practice. We present an optimal solution for binary groups when the underlying distributions of data sources are known and all data sources have equal costs. For the generic case with unequal costs, we design an approximation algorithm that performs well in practice. When the underlying distributions are unknown, we develop an exploration-exploitation based strategy with a reward function that captures the cost and approximations of group distributions in each data source. Besides theoretical analysis, we conduct comprehensive experiments that confirm the effectiveness of our algorithms.
引用
收藏
页码:2519 / 2532
页数:14
相关论文
共 50 条
  • [1] Fairness-aware Data Integration
    Mazilu, Lacramioara
    Paton, Norman W.
    Konstantinou, Nikolaos
    Fernandes, Alvaro A. A.
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2022, 14 (04):
  • [2] Considerations on Fairness-aware Data Mining
    Kamishima, Toshihiro
    Akaho, Shotaro
    Asoh, Hideki
    Sakuma, Jun
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 378 - 385
  • [3] Empirical analysis of fairness-aware data segmentation
    Okura, Seiji
    Mohri, Takao
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 155 - 162
  • [4] Fairness-Aware PAC Learning from Corrupted Data
    Konstantinov, Nikola
    Lampert, Christoph H.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [5] Fairness-Aware Range Queries for Selecting Unbiased Data
    Shetiya, Suraj
    Swift, Ian P.
    Asudeh, Abolfazl
    Das, Gautam
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1423 - 1436
  • [6] On the Impossibility of Fairness-Aware Learning from Corrupted Data
    Konstantinov, Nikola
    Lampert, Christoph H.
    ALGORITHMIC FAIRNESS THROUGH THE LENS OF CAUSALITY AND ROBUSTNESS WORKSHOP, VOL 171, 2021, 171 : 59 - 72
  • [7] Fairness-Aware PAC Learning from Corrupted Data
    Konstantinov, Nikola
    Lampert, Christoph H.
    Journal of Machine Learning Research, 2022, 23 : 1 - 60
  • [8] TREATS: Fairness-aware entity resolution over streaming data
    Araujo, Tiago Brasileiro
    Efthymiou, Vasilis
    Christophides, Vassilis
    Pitoura, Evaggelia
    Stefanidis, Kostas
    INFORMATION SYSTEMS, 2025, 129
  • [9] Collaboration- and Fairness-Aware Big Data Management in Distributed Clouds
    Xia, Qiufen
    Xu, Zichuan
    Liang, Weifa
    Zomaya, Albert Y.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (07) : 1941 - 1953
  • [10] Fairness-aware data offloading of IoT applications enabled by heterogeneous UAVs
    Yan, Hui
    Bao, Weidong
    Zhu, Xiaomin
    Wang, Ji
    Wu, Guanlin
    Cao, Jiang
    INTERNET OF THINGS, 2023, 22