Dealing with Data Bias in Classification: Can Generated Data Ensure Representation and Fairness?

被引:0
|
作者
Manh Khoi Duong [1 ]
Conrad, Stefan [1 ]
机构
[1] Heinrich Heine Univ, Univ Str 1, D-40225 Dusseldorf, Germany
关键词
fairness; bias; synthetic data; fairness-agnostic; machine learning; optimization;
D O I
10.1007/978-3-031-39831-5_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fairness is a critical consideration in data analytics and knowledge discovery because biased data can perpetuate inequalities through further pipelines. In this paper, we propose a novel pre-processing method to address fairness issues in classification tasks by adding synthetic data points for more representativeness. Our approach utilizes a statistical model to generate new data points, which are evaluated for fairness using discrimination measures. These measures aim to quantify the disparities between demographic groups that may be induced by the bias in data. Our experimental results demonstrate that the proposed method effectively reduces bias for several machine learning classifiers without compromising prediction performance. Moreover, our method outperforms existing pre-processing methods on multiple datasets by Pareto-dominating them in terms of performance and fairness. Our findings suggest that our method can be a valuable tool for data analysts and knowledge discovery practitioners who seek to yield for fair, diverse, and representative data.
引用
收藏
页码:176 / 190
页数:15
相关论文
共 50 条
  • [1] Dealing with Bias and Fairness in Data Science Systems: A Practical Hands-on Tutorial
    Saleiro, Pedro
    Rodolfa, Kit T.
    Ghani, Rayid
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3513 - 3514
  • [2] Dealing with Data Imbalance in Text Classification
    Padurariu, Cristian
    Breaban, Mihaela Elena
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019), 2019, 159 : 736 - 745
  • [3] Dealing with the Avalanche of Data Generated in High Data Rate Macromolecular Crystallography
    Jakoncic, Jean
    Bernstein, Herbert J.
    Soares, Alexei
    Shi, Wuxian
    Fuchs, Martin
    Petkus, Robert
    Sweet, Robert M.
    McSweeney, Sean
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2017, 73 : A209 - A209
  • [4] A Methodology for Controlling Bias and Fairness in Synthetic Data Generation
    Barbierato, Enrico
    Della Vedova, Marco L.
    Tessera, Daniele
    Toti, Daniele
    Vanoli, Nicola
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [5] Dealing with death data: individual hazards, mortality and bias
    Zens, MS
    Peart, DR
    TRENDS IN ECOLOGY & EVOLUTION, 2003, 18 (07) : 366 - 373
  • [6] On the Impact of Data Quality on Image Classification Fairness
    Barry, Aki
    Han, Lei
    Demartini, Gianluca
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2225 - 2229
  • [7] A perspective on gender bias in generated text data
    Hupperich, Thomas
    FRONTIERS IN HUMAN DYNAMICS, 2024, 6
  • [8] How Can LIMS Help Ensure Data Integrity?
    McDowall, R. D.
    LC GC EUROPE, 2016, 29 (06) : 310 - 316
  • [9] Towards Fairness in AI: Addressing Bias in Data Using GANs
    Rajabi, Amirarsalan
    Garibay, Ozlem O.
    HCI INTERNATIONAL 2021 - LATE BREAKING PAPERS: MULTIMODALITY, EXTENDED REALITY, AND ARTIFICIAL INTELLIGENCE, 2021, 13095 : 509 - 518
  • [10] Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias
    Wyllie, Sierra
    Shumailov, Ilia
    Papernot, Nicolas
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 2113 - 2147