Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models

被引:0
|
作者
Graf-vlach, Lorenz [1 ,2 ]
Wagner, Stefan [3 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
[2] Univ Stuttgart, Inst Software Engn, Stuttgart, Germany
[3] Tech Univ Munich, TUM Sch Computat Informat & Technol, Heilbronn, Germany
关键词
Regression; endogeneity; confounder; two-stage least squares; 2SLS; instrumental variables; SAMPLE SELECTION BIAS; ECONOMIC-GROWTH; REGRESSION; RELEVANCE; VALIDITY; TESTS; SIZE;
D O I
10.1145/3674730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
引用
收藏
页数:31
相关论文
共 50 条
  • [41] The Causal Interpretation of Two-Stage Least Squares with Multiple Instrumental Variablest
    Mogstad, Magne
    Torgovitsky, Alexander
    Walters, Christopher R.
    AMERICAN ECONOMIC REVIEW, 2021, 111 (11): : 3663 - 3698
  • [42] Models of Two-Stage Mutual Best Choice
    Dotsenko, S. I.
    Ivashko, A. A.
    AUTOMATION AND REMOTE CONTROL, 2018, 79 (09) : 1722 - 1731
  • [43] A two-stage procedure for partially identified models
    Kaido, Hiroaki
    White, Halbert
    JOURNAL OF ECONOMETRICS, 2014, 182 (01) : 5 - 13
  • [44] The uncertain two-stage network DEA models
    Jiang, Bao
    Chen, Hao
    Li, Jian
    Lio, Waichon
    SOFT COMPUTING, 2021, 25 (01) : 421 - 429
  • [45] The robust variance estimator for two-stage models
    Hardin, James W.
    STATA JOURNAL, 2002, 2 (03): : 253 - 266
  • [46] Models of Two-Stage Mutual Best Choice
    S. I. Dotsenko
    A. A. Ivashko
    Automation and Remote Control, 2018, 79 : 1722 - 1731
  • [47] The uncertain two-stage network DEA models
    Bao Jiang
    Hao Chen
    Jian Li
    Waichon Lio
    Soft Computing, 2021, 25 : 421 - 429
  • [48] Bias and Efficiency for SEM With Missing Data and Auxiliary Variables: Two-Stage Robust Method Versus Two-Stage ML
    Yuan, Ke-Hai
    Tong, Xin
    Zhang, Zhiyong
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2015, 22 (02) : 178 - 192
  • [49] One-stage and two-stage DEA estimation of the effects of contextual variables
    Johnson, Andrew L.
    Kuosmanen, Timo
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 220 (02) : 559 - 570
  • [50] Two-stage regression quantiles and two-stage trimmed least squares estimators for structural equation models
    Chen, LA
    Portnoy, S
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1996, 25 (05) : 1005 - 1032