Learning Models from Data with Measurement Error: Tackling Underreporting

被引:0
|
作者
Adams, Roy [1 ]
Ji, Yuelong [2 ]
Wang, Xiaobin [2 ]
Saria, Suchi [1 ,3 ,4 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Bloomberg Sch Publ Hlth, Ctr Life Origins Dis, Dept Populat Family & Reporduct Hlth, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Dept Appl Math & Stat, Baltimore, MD 21218 USA
[4] Bayesian Hlth, New York, NY USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷
基金
美国国家卫生研究院;
关键词
SELF-REPORTED SMOKING; MATERNAL SMOKING; CHILDHOOD; OBESITY; ASSOCIATION; PREGNANCY; ABUNDANCE; DISEASE; RISK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Measurement error in observational datasets can lead to systematic bias in inferences based on these datasets. As studies based on observational data are increasingly used to inform decisions with real-world impact, it is critical that we develop a robust set of techniques for analyzing and adjusting for these biases. In this paper we present a method for estimating the distribution of an outcome given a binary exposure that is subject to underreporting. Our method is based on a missing data view of the measurement error problem, where the true exposure is treated as a latent variable that is marginalized out of a joint model. We prove three different conditions under which the outcome distribution can still be identified from data containing only errorprone observations of the exposure. We demonstrate this method on synthetic data and analyze its sensitivity to near violations of the identifiability conditions. Finally, we use this method to estimate the effects of maternal smoking and heroin use during pregnancy on childhood obesity, two import problems from public health. Using the proposed method, we estimate these effects using only subject-reported drug use data and refine the range of estimates generated by a sensitivity analysis-based approach. Further, the estimates produced by our method are consistent with existing literature on both the effects of maternal smoking and the rate at which subjects underreport smoking.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Spatial models for non-Gaussian data with covariate measurement error
    Tadayon, Vahid
    Torabi, Mahmoud
    ENVIRONMETRICS, 2019, 30 (03)
  • [22] Covariate Measurement Error Adjustment for Multilevel Models With Application to Educational Data
    Battauz, Michela
    Bellio, Ruggero
    Gori, Enrico
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2011, 36 (03) : 283 - 306
  • [23] Estimation of spatial autoregressive models with measurement error for large data sets
    Suesse, Thomas
    COMPUTATIONAL STATISTICS, 2018, 33 (04) : 1627 - 1648
  • [24] Multiscale measurement error models for aggregated small area health data
    Aregay, Mehreteab
    Lawson, Andrew B.
    Faes, Christel
    Kirby, Russell S.
    Carroll, Rachel
    Watjou, Kevin
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2016, 25 (04) : 1201 - 1223
  • [25] Linear transformation models for failure time data with covariate measurement error
    Cheng, SC
    Wang, NY
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (454) : 706 - 716
  • [26] A new class of measurement-error models, with applications to dietary data
    Carroll, RJ
    Freedman, LS
    Kipnis, V
    Li, L
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1998, 26 (03): : 467 - 477
  • [27] Multicollinearity in measurement error models
    Gokmen, Sahika
    Dagalp, Rukiye
    Kilickaplan, Serdar
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (02) : 474 - 485
  • [28] Handbook of Measurement Error Models
    Kuchenhoff, Helmut
    BIOMETRICAL JOURNAL, 2022, 64 (08) : 1498 - 1499
  • [29] ERROR MODELS FOR SYSTEMS MEASUREMENT
    FITZPATRICK, J
    MICROWAVE JOURNAL, 1978, 21 (05) : 63 - 66
  • [30] Measurement error models with interactions
    Midthune, Douglas
    Carroll, Raymond J.
    Freedman, Laurence S.
    Kipnis, Victor
    BIOSTATISTICS, 2016, 17 (02) : 277 - 290