Assessing Fairness in the Presence of Missing Data

被引:0
|
作者
Zhang, Yiliang [1 ]
Long, Qi [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
PROPENSITY SCORE ESTIMATION; IMPUTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data are prevalent and present daunting challenges in real data analysis. While there is a growing body of literature on fairness in analysis of fully observed data, there has been little theoretical work on investigating fairness in analysis of incomplete data. In practice, a popular analytical approach for dealing with missing data is to use only the set of complete cases, i.e., observations with all features fully observed to train a prediction algorithm. However, depending on the missing data mechanism, the distribution of complete cases and the distribution of the complete data may be substantially different. When the goal is to develop a fair algorithm in the complete data domain where there are no missing values, an algorithm that is fair in the complete case domain may show disproportionate bias towards some marginalized groups in the complete data domain. To fill this significant gap, we study the problem of estimating fairness in the complete data domain for an arbitrary model evaluated merely using complete cases. We provide upper and lower bounds on the fairness estimation error and conduct numerical experiments to assess our theoretical results. Our work provides the first known theoretical results on fairness guarantee in analysis of incomplete data.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Assessing SNP-SNP interactions in the presence of missing genotype data
    Ruczinski, I.
    GENETIC EPIDEMIOLOGY, 2007, 31 (05) : 495 - 496
  • [2] Assessing environmental stressors via Bayesian Model Averaging in the presence of missing data
    Boone, E. L.
    Ye, K.
    Smith, E. P.
    ENVIRONMETRICS, 2011, 22 (01) : 13 - 22
  • [3] Assessing Intervention Effects in the Presence of Missing Scores
    Peng, Chao-Ying Joanne
    Chen, Li-Ting
    EDUCATION SCIENCES, 2021, 11 (02): : 1 - 20
  • [4] SPECTRAL ESTIMATION IN THE PRESENCE OF MISSING DATA
    Bahamonde, Natalia
    Doukhan, Paul
    THEORY OF PROBABILITY AND MATHEMATICAL STATISTICS, 2016, 95 : 55 - 74
  • [5] Haplotype analysis in the presence of missing data
    Liu, N
    Zhao, H
    GENETIC EPIDEMIOLOGY, 2005, 29 (03) : 265 - 265
  • [6] Response shift in the presence of missing data
    D. L. Fairclough
    Quality of Life Research, 2015, 24 : 565 - 566
  • [7] Causal Discovery in the Presence of Missing Data
    Tu, Ruibo
    Zhang, Cheng
    Ackermann, Paul
    Mohan, Karthika
    Kjellstrom, Hedvig
    Zhang, Kun
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [8] Response shift in the presence of missing data
    Fairclough, D. L.
    QUALITY OF LIFE RESEARCH, 2015, 24 (03) : 565 - 566
  • [9] ARCH MODELING IN THE PRESENCE OF MISSING DATA
    Bondon, Pascal
    2013 ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2013, : 39 - 43
  • [10] Prospective prediction in the presence of missing data
    Marshall, G
    Warner, B
    MaWhinney, S
    Hammermeister, K
    STATISTICS IN MEDICINE, 2002, 21 (04) : 561 - 570