Learning Models from Data with Measurement Error: Tackling Underreporting

被引:0
|
作者
Adams, Roy [1 ]
Ji, Yuelong [2 ]
Wang, Xiaobin [2 ]
Saria, Suchi [1 ,3 ,4 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Bloomberg Sch Publ Hlth, Ctr Life Origins Dis, Dept Populat Family & Reporduct Hlth, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Dept Appl Math & Stat, Baltimore, MD 21218 USA
[4] Bayesian Hlth, New York, NY USA
基金
美国国家卫生研究院;
关键词
SELF-REPORTED SMOKING; MATERNAL SMOKING; CHILDHOOD; OBESITY; ASSOCIATION; PREGNANCY; ABUNDANCE; DISEASE; RISK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Measurement error in observational datasets can lead to systematic bias in inferences based on these datasets. As studies based on observational data are increasingly used to inform decisions with real-world impact, it is critical that we develop a robust set of techniques for analyzing and adjusting for these biases. In this paper we present a method for estimating the distribution of an outcome given a binary exposure that is subject to underreporting. Our method is based on a missing data view of the measurement error problem, where the true exposure is treated as a latent variable that is marginalized out of a joint model. We prove three different conditions under which the outcome distribution can still be identified from data containing only errorprone observations of the exposure. We demonstrate this method on synthetic data and analyze its sensitivity to near violations of the identifiability conditions. Finally, we use this method to estimate the effects of maternal smoking and heroin use during pregnancy on childhood obesity, two import problems from public health. Using the proposed method, we estimate these effects using only subject-reported drug use data and refine the range of estimates generated by a sensitivity analysis-based approach. Further, the estimates produced by our method are consistent with existing literature on both the effects of maternal smoking and the rate at which subjects underreport smoking.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Measurement error models with auxiliary data
    Chen, XH
    Hong, H
    Tamer, E
    REVIEW OF ECONOMIC STUDIES, 2005, 72 (02): : 343 - 366
  • [2] Measurement Error Models for Interlaboratory Comparison Measurement Data
    Berni, Rossella
    Nikiforova, Nedka D.
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2016, 32 (06) : 2005 - 2015
  • [3] Parameter estimation approaches to tackling measurement error and multicollinearity in ordinal probit models
    Guan, Jing
    Zhao, Yunfeng
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2020, 49 (16) : 3835 - 3859
  • [4] Bayesian Estimation of Measurement Error Models with Longitudinal Data
    Li, Dewang
    Qiu, Meilan
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ELECTRONIC INDUSTRY AND AUTOMATION (EIA 2017), 2017, 145 : 242 - 245
  • [5] Analysis of inaccurate data with mixture measurement error models
    Park, Seunghwan
    Kim, Jae-Kwang
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2018, 47 (01) : 1 - 12
  • [6] Analysis of inaccurate data with mixture measurement error models
    Seunghwan Park
    Jae-Kwang Kim
    Journal of the Korean Statistical Society, 2018, 47 : 1 - 12
  • [7] Bayesian inference in measurement error models for replicated data
    de Castro, Mario
    Bolfarine, Heleno
    Galea, M.
    ENVIRONMETRICS, 2013, 24 (01) : 22 - 30
  • [8] GMM estimation in panel data models with measurement error
    Wansbeek, T
    JOURNAL OF ECONOMETRICS, 2001, 104 (02) : 259 - 268
  • [9] Semiparametric estimation for measurement error models with validation data
    Xu, Yuhang
    Kim, Jae Kwang
    Li, Yehua
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2017, 45 (02): : 185 - 201
  • [10] Learning Phrase-Based Spelling Error Models from Clickthrough Data
    Sun, Xu
    Gao, Jianfeng
    Micol, Daniel
    Quirk, Chris
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 266 - 274