Assessing Fairness in the Presence of Missing Data

被引:0
|
作者
Zhang, Yiliang [1 ]
Long, Qi [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
PROPENSITY SCORE ESTIMATION; IMPUTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data are prevalent and present daunting challenges in real data analysis. While there is a growing body of literature on fairness in analysis of fully observed data, there has been little theoretical work on investigating fairness in analysis of incomplete data. In practice, a popular analytical approach for dealing with missing data is to use only the set of complete cases, i.e., observations with all features fully observed to train a prediction algorithm. However, depending on the missing data mechanism, the distribution of complete cases and the distribution of the complete data may be substantially different. When the goal is to develop a fair algorithm in the complete data domain where there are no missing values, an algorithm that is fair in the complete case domain may show disproportionate bias towards some marginalized groups in the complete data domain. To fill this significant gap, we study the problem of estimating fairness in the complete data domain for an arbitrary model evaluated merely using complete cases. We provide upper and lower bounds on the fairness estimation error and conduct numerical experiments to assess our theoretical results. Our work provides the first known theoretical results on fairness guarantee in analysis of incomplete data.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Estimation of a duration model in the presence of missing data
    Stinebrickner, TR
    REVIEW OF ECONOMICS AND STATISTICS, 1999, 81 (03) : 529 - 542
  • [22] A comparative study on repeated measurements data in the presence of missing data
    Al-Rawwash, Mohammad Y.
    Alquran, Haneen
    ELECTRONIC JOURNAL OF APPLIED STATISTICAL ANALYSIS, 2023, 16 (02) : 410 - 422
  • [23] Monitoring data quality for telehealth systems in the presence of missing data
    Mahmood, Tahir
    Wittenberg, Philipp
    Zwetsloot, Inez Maria
    Wang, Hailiang
    Tsui, Kwok Leung
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 126 : 156 - 163
  • [24] Assessing FAIRness of citizen science data in the context of the Green Deal Data Space
    Lush, Victoria
    Bastin, L.
    Otsu, K.
    Maso, J.
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [25] Assessing Granger Causality on Irregular Missing and Extreme Data
    Zanin, Massimiliano
    IEEE ACCESS, 2021, 9 : 75362 - 75374
  • [26] Assessing Adversarial Effects of Noise in Missing Data Imputation
    Mangussi, Arthur Dantas
    Pereira, Ricardo Cardoso
    Abreu, Pedro Henriques
    Lorena, Ana Carolina
    INTELLIGENT SYSTEMS, BRACIS 2024, PT I, 2025, 15412 : 200 - 214
  • [27] A diagnostic for assessing the influence of cases on the prediction of missing data
    Cavanaugh, JE
    Oleson, JJ
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 2001, 50 : 427 - 440
  • [28] Assessing the benefits of imputing ERP projects with missing data
    Myrtveit, I
    Stensrud, E
    Olsson, U
    SEVENTH INTERNATIONAL SOFTWARE METRICS SYMPOSIUM - METRICS 2001, PROCEEDINGS, 2000, : 78 - 84
  • [29] TESTING EQUALITY OF MEANS IN THE PRESENCE OF CORRELATION AND MISSING DATA
    BHOJ, DS
    BIOMETRICAL JOURNAL, 1991, 33 (01) : 63 - 72
  • [30] Regression in the presence missing data using ensemble methods
    Hassan, Mostafa M.
    Atiya, Amir F.
    El-Gayar, Neamat
    El-Fouly, Raafat
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1261 - +