Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models

被引:0
|
作者
Graf-vlach, Lorenz [1 ,2 ]
Wagner, Stefan [3 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
[2] Univ Stuttgart, Inst Software Engn, Stuttgart, Germany
[3] Tech Univ Munich, TUM Sch Computat Informat & Technol, Heilbronn, Germany
关键词
Regression; endogeneity; confounder; two-stage least squares; 2SLS; instrumental variables; SAMPLE SELECTION BIAS; ECONOMIC-GROWTH; REGRESSION; RELEVANCE; VALIDITY; TESTS; SIZE;
D O I
10.1145/3674730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Bias in estimating the causal hazard ratio when using two-stage instrumental variable methods
    Wan, Fei
    Small, Dylan
    Bekelman, Justin E.
    Mitra, Nandita
    STATISTICS IN MEDICINE, 2015, 34 (14) : 2235 - 2265
  • [32] Text to Image Synthesis Using Two-Stage Generation and Two-Stage Discrimination
    Zhang, Zhiqiang
    Zhang, Yunye
    Yu, Wenxin
    He, Gang
    Jiang, Ning
    He, Gang
    Fan, Yibo
    Yang, Zhuo
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 110 - 114
  • [33] Optimizing electricity distribution using two-stage integer recourse models
    Haneveld, WKK
    van der Vlerk, MH
    STOCHASTIC OPTIMIZATION: ALGORITHMS AND APPLICATIONS, 2001, 54 : 137 - 154
  • [34] Two-stage blind deconvolution using state-space models
    Cichocki, A
    Zhang, LQ
    ICONIP'98: THE FIFTH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING JOINTLY WITH JNNS'98: THE 1998 ANNUAL CONFERENCE OF THE JAPANESE NEURAL NETWORK SOCIETY - PROCEEDINGS, VOLS 1-3, 1998, : 729 - 732
  • [35] USING INSTRUMENTAL VARIABLES FOR SELECTING THE ORDER OF ARMA MODELS
    REZAYAT, F
    ANANDALINGAM, G
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1988, 17 (09) : 3029 - 3065
  • [36] Marginal and Nested Structural Models Using Instrumental Variables
    Tan, Zhiqiang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 157 - 169
  • [37] The Simon's two-stage design accounting for genetic heterogeneity
    Ko, Feng-shou
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (07) : 2661 - 2669
  • [38] A GENERAL CLASS OF ESTIMATORS IN TWO-STAGE SAMPLING WITH TWO AUXILIARY VARIABLES
    Sahoo, L. N.
    Sahoo, R. K.
    Senapati, S. C.
    Mangaraj, A. K.
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2011, 40 (05): : 757 - 765
  • [39] A two-stage strategy to accommodate general patterns of confounding in the design of observational studies
    Haneuse, Sebastien
    Schildcrout, Jonathan
    Gillen, Daniel
    BIOSTATISTICS, 2012, 13 (02) : 274 - 288
  • [40] Use of instrumental variables in the analysis of generalized linear models in the presence of unmeasured confounding with applications to epidemiological research
    Johnston, K. M.
    Gustafson, P.
    Levy, A. R.
    Grootendorst, P.
    STATISTICS IN MEDICINE, 2008, 27 (09) : 1539 - 1556