ACCURACY IN THE APPLICATION OF STATISTICAL MATCHING METHODS FOR CONTINUOUS VARIABLES USING AUXILIARY DATA

被引:2
|
作者
Van Delden, Arnout [1 ]
Du Chatinier, Bart J. [1 ]
Scholtus, Sander [1 ]
机构
[1] Stat Netherlands CBS, The Hague, Netherlands
关键词
Administrative data; Data integration; EM algorithm; Integration of surveys; Official statistics; UNCERTAINTY;
D O I
10.1093/jssam/smz032
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Statistical matching is a technique to combine variables in two or more nonoverlapping samples that are drawn from the same population. In the current study, the unobserved joint distribution between two target variables in nonoverlapping samples is estimated using a parametric model. A classical assumption to estimate this joint distribution is that the target variables are independent given the background variables observed in both samples. A problem with the use of this conditional independence assumption is that the estimated joint distribution may be severely biased when the assumption does not hold, which in general will be unacceptable for official statistics. Here, we explored to what extent the accuracy can be improved by the use of two types of auxiliary information: the use of a common administrative variable and the use of a small additional sample from a similar population. This additional sample is included by using the partial correlation of the target variables given the background variables or by using an EM algorithm. In total, four different approaches were compared to estimate the joint distribution of the target variables. Starting with empirical data, we show how the accuracy of the joint distribution is affected by the use of administrative data and by the size of the additional sample included via a partial correlation and through an EM algorithm. The study further shows how this accuracy depends on the strength of the relations among the target and auxiliary variables. We found that including a common administrative variable does not always improve the accuracy of the results. We further found that the EM algorithm nearly always yielded the most accurate results; this effect is largest when the explained variance of the separate target variables by the common background variables is not large.
引用
收藏
页码:990 / 1017
页数:28
相关论文
共 50 条
  • [1] BME prediction of continuous geographical properties using auxiliary variables
    Yong Yang
    ChuTian Zhang
    Ruoxi Zhang
    Stochastic Environmental Research and Risk Assessment, 2016, 30 : 9 - 26
  • [2] BME prediction of continuous geographical properties using auxiliary variables
    Yang, Yong
    Zhang, ChuTian
    Zhang, Ruoxi
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2016, 30 (01) : 9 - 26
  • [3] STATISTICAL-METHODS OF RISK ASSESSMENT FOR CONTINUOUS-VARIABLES
    WEST, RW
    KODELL, RL
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (12) : 3363 - 3376
  • [4] Using Principal Components as Auxiliary Variables in Missing Data Estimation
    Howard, Waylon J.
    Rhemtulla, Mijke
    Little, Todd D.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2015, 50 (03) : 285 - 299
  • [5] On the accuracy of a covariance matching method for continuous-time errors-in-variables identification
    Soderstrom, Torsten
    Irshad, Yasir
    Mossberg, Magnus
    Zheng, Wei Xing
    AUTOMATICA, 2013, 49 (10) : 2982 - 2993
  • [6] Survival analysis using auxiliary variables via multiple imputation, with application to AIDS clinical trial data
    Faucett, CL
    Schenker, N
    Taylor, JMG
    BIOMETRICS, 2002, 58 (01) : 37 - 47
  • [7] ON RATIO AND LINEAR REGRESSION METHODS OF ESTIMATION USING SEVERAL AUXILIARY VARIABLES
    SRIVASTA.SK
    ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (03): : 775 - &
  • [8] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
    Shashi Bhushan
    Abhay Pratap Pandey
    Communications in Mathematics and Statistics, 2023, 11 : 325 - 340
  • [9] A Robust Approach of Regression-Based Statistical Matching for Continuous Data
    Sohn, Sooncheol
    Jhun, Myoungshic
    KOREAN JOURNAL OF APPLIED STATISTICS, 2012, 25 (02) : 331 - 339
  • [10] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
    Bhushan, Shashi
    Pandey, Abhay Pratap
    COMMUNICATIONS IN MATHEMATICS AND STATISTICS, 2023, 11 (02) : 325 - 340