Identifying Selection Bias from Observational Data

被引:0
|
作者
Kaltenpoth, David [1 ]
Vreeken, Jilles [1 ]
机构
[1] CISPA Helmholtz Ctr Informat Secur, Saarbrucken, Germany
关键词
CAUSAL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Access to a representative sample from the population is an assumption that underpins all of machine learning. Unfortunately, selection effects can cause observations to instead come from a subpopulation, by which our inferences may be subject to bias. It is therefore essential to know whether or not a sample is affected by selection effects. We study under which conditions we can identify selection bias and give results for both parametric and non-parametric families of distributions. Based on these results, we develop two practical methods to determine whether or not an observed sample comes from a distribution subject to selection bias. Through extensive evaluation on synthetic and real-world data, we verify that our methods beat the state of the art both in detecting as well as characterizing selection bias.
引用
收藏
页码:8177 / 8185
页数:9
相关论文
共 50 条
  • [41] Selection Bias in Observational Studies Evaluating Cancer Screening Tests and Examinations
    Czwikla, Jonas
    Langner, Ingo
    Haug, Ulrike
    ONCOLOGY RESEARCH AND TREATMENT, 2020, 43 : 28 - 28
  • [42] Misclassification and selection bias when identifying Alzheimer's disease solely from Medicare claims records
    Newcomer, R
    Clay, T
    Luxenberg, JS
    Miller, RH
    JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 1999, 47 (02) : 215 - 219
  • [43] Personalized treatment selection using observational data
    Kulasekera, K. B.
    Tholkage, Sudaraka
    Kong, Maiying
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (05) : 1115 - 1127
  • [44] Learning from Point Sets with Observational Bias
    Xiong, Liang
    Schneider, Jeff
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2014, : 898 - 906
  • [45] Identifying Present Bias from the Timing of Choices
    Heidhues, Paul
    Strack, Philipp
    AMERICAN ECONOMIC REVIEW, 2021, 111 (08): : 2594 - 2622
  • [46] When is selection bias not selection bias?
    Przepiorka, D
    Estey, E
    AMERICAN JOURNAL OF HEMATOLOGY, 1996, 52 (04) : 330 - 331
  • [47] Can we trust observational data? Keeping bias in mind
    Wykes, Til
    Sweeney, Angela
    Guha, Martin
    JOURNAL OF MENTAL HEALTH, 2019, 28 (06) : 579 - 582
  • [48] Overall survival advantage with partial nephrectomy: A bias of observational data?
    Shuch, Brian
    Hanley, Janet
    Lai, Julie
    Vourganti, Srinivas
    Kim, Simon P.
    Setodji, Claude M.
    Dick, Andrew W.
    Chow, Wong-Ho
    Saigal, Chris
    CANCER, 2013, 119 (16) : 2981 - 2989
  • [49] BayesBoost: Identifying and Handling Bias Using Synthetic Data Generators
    Draghi, Barbara
    Wang, Zhenchen
    Myles, Puja
    Tucker, Allan
    THIRD INTERNATIONAL WORKSHOP ON LEARNING WITH IMBALANCED DOMAINS: THEORY AND APPLICATIONS, VOL 154, 2021, 154 : 49 - 62
  • [50] Multiple-bias modelling for analysis of observational data - Discussion
    Copas, J
    Jones, DR
    Spiegelhalter, D
    Rice, K
    Armstrong, B
    Senn, S
    Carpenter, J
    Kenward, M
    De Stavola, B
    Nitsch, D
    Muirhead, CR
    Hodges, J
    Longford, NT
    Gelman, A
    Draper, D
    Gustafson, P
    McCandless, L
    Rubin, DB
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2005, 168 : 291 - 306