Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models

被引:0
|
作者
Graf-vlach, Lorenz [1 ,2 ]
Wagner, Stefan [3 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
[2] Univ Stuttgart, Inst Software Engn, Stuttgart, Germany
[3] Tech Univ Munich, TUM Sch Computat Informat & Technol, Heilbronn, Germany
关键词
Regression; endogeneity; confounder; two-stage least squares; 2SLS; instrumental variables; SAMPLE SELECTION BIAS; ECONOMIC-GROWTH; REGRESSION; RELEVANCE; VALIDITY; TESTS; SIZE;
D O I
10.1145/3674730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Mitigating Selection Bias: A Bayesian Approach to Two-stage Causal Modeling With Instrumental Variables for Nonnormal Missing Data
    Shi, Dingjing
    Tong, Xin
    SOCIOLOGICAL METHODS & RESEARCH, 2022, 51 (03) : 1052 - 1099
  • [22] Estimation in two-stage models with heteroscedasticity
    Buonaccorsi, John
    INTERNATIONAL STATISTICAL REVIEW, 2006, 74 (03) : 403 - 418
  • [23] Two-stage prediction in linear models
    Jeske, Daniel R.
    Kurum, Esra
    Yao, Weixin
    Rizzo, Shemra
    SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS, 2018, 37 (03): : 311 - 321
  • [24] Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology
    Oh, Eun Jeong
    Parikh, Ravi B.
    Chivers, Corey
    Chen, Jinbo
    JCO CLINICAL CANCER INFORMATICS, 2021, 5 : 1015 - 1023
  • [25] Eliminating Survivor Bias in Two-stage Instrumental Variable Estimators
    Vansteelandt, Stijn
    Walter, Stefan
    Tchetgen, Eric Tchetgen
    EPIDEMIOLOGY, 2018, 29 (04) : 536 - 541
  • [26] Two-Stage Variables Acceptance Sampling Plans Using Process Loss Functions
    Aslam, Muhammad
    Yen, Ching-Ho
    Chang, Chia-Hao
    Jun, Chi-Hyuck
    Ahmad, Munir
    Rasool, Mujahid
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2012, 41 (20) : 3633 - 3647
  • [27] TWO-STAGE CLEANING TECHNOLOGY WATER OUT FROM OIL
    Pashkevich, M. A.
    Golubev, I. A.
    JOURNAL OF MINING INSTITUTE, 2013, 203 : 83 - 85
  • [28] Appropriate modeling of endogeneity in cross-lagged models: Efficacy of auxiliary and model-implied instrumental variables
    Junyan Fang
    Zhonglin Wen
    Kit-Tai Hau
    Xitong Huang
    Behavior Research Methods, 57 (4)
  • [29] Adjusting for unmeasured confounding using validation data: Simplified two-stage calibration for survival and dichotomous outcomes
    Hjellvik, Vidar
    De Bruin, Marie L.
    Samuelsen, Sven O.
    Karlstad, Oystein
    Andersen, Morten
    Haukka, Jari
    Vestergaard, Peter
    de Vries, Frank
    Furu, Kari
    STATISTICS IN MEDICINE, 2019, 38 (15) : 2719 - 2734
  • [30] Regression calibration for models with two predictor variables measured with error and their interaction, using instrumental variables and longitudinal data
    Strand, Matthew
    Sillau, Stefan
    Grunwald, Gary K.
    Rabinovitch, Nathan
    STATISTICS IN MEDICINE, 2014, 33 (03) : 470 - 487