ACCURACY IN THE APPLICATION OF STATISTICAL MATCHING METHODS FOR CONTINUOUS VARIABLES USING AUXILIARY DATA

被引：2

作者：

Van Delden, Arnout ^{[1
]}

Du Chatinier, Bart J. ^{[1
]}

Scholtus, Sander ^{[1
]}

机构：

[1] Stat Netherlands CBS, The Hague, Netherlands

来源：

JOURNAL OF SURVEY STATISTICS AND METHODOLOGY | 2020年 / 8卷 / 05期

关键词：

Administrative data; Data integration; EM algorithm; Integration of surveys; Official statistics; UNCERTAINTY;

D O I：

10.1093/jssam/smz032

中图分类号：

O1 [数学]; C [社会科学总论];

学科分类号：

03 ; 0303 ; 0701 ; 070101 ;

摘要：

Statistical matching is a technique to combine variables in two or more nonoverlapping samples that are drawn from the same population. In the current study, the unobserved joint distribution between two target variables in nonoverlapping samples is estimated using a parametric model. A classical assumption to estimate this joint distribution is that the target variables are independent given the background variables observed in both samples. A problem with the use of this conditional independence assumption is that the estimated joint distribution may be severely biased when the assumption does not hold, which in general will be unacceptable for official statistics. Here, we explored to what extent the accuracy can be improved by the use of two types of auxiliary information: the use of a common administrative variable and the use of a small additional sample from a similar population. This additional sample is included by using the partial correlation of the target variables given the background variables or by using an EM algorithm. In total, four different approaches were compared to estimate the joint distribution of the target variables. Starting with empirical data, we show how the accuracy of the joint distribution is affected by the use of administrative data and by the size of the additional sample included via a partial correlation and through an EM algorithm. The study further shows how this accuracy depends on the strength of the relations among the target and auxiliary variables. We found that including a common administrative variable does not always improve the accuracy of the results. We further found that the EM algorithm nearly always yielded the most accurate results; this effect is largest when the explained variance of the separate target variables by the common background variables is not large.

引用

页码：990 / 1017

页数：28

共 50 条

[1] BME prediction of continuous geographical properties using auxiliary variables
Yong Yang
ChuTian Zhang
Ruoxi Zhang
Stochastic Environmental Research and Risk Assessment, 2016, 30 : 9 - 26
[2] BME prediction of continuous geographical properties using auxiliary variables
Yang, Yong
Zhang, ChuTian
Zhang, Ruoxi
STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2016, 30 (01) : 9 - 26
[3] STATISTICAL-METHODS OF RISK ASSESSMENT FOR CONTINUOUS-VARIABLES
WEST, RW
KODELL, RL
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (12) : 3363 - 3376
[4] Using Principal Components as Auxiliary Variables in Missing Data Estimation
Howard, Waylon J.
Rhemtulla, Mijke
Little, Todd D.
MULTIVARIATE BEHAVIORAL RESEARCH, 2015, 50 (03) : 285 - 299
[5] On the accuracy of a covariance matching method for continuous-time errors-in-variables identification
Soderstrom, Torsten
Irshad, Yasir
Mossberg, Magnus
Zheng, Wei Xing
AUTOMATICA, 2013, 49 (10) : 2982 - 2993
[6] Survival analysis using auxiliary variables via multiple imputation, with application to AIDS clinical trial data
Faucett, CL
Schenker, N
Taylor, JMG
BIOMETRICS, 2002, 58 (01) : 37 - 47
[7] ON RATIO AND LINEAR REGRESSION METHODS OF ESTIMATION USING SEVERAL AUXILIARY VARIABLES
SRIVASTA.SK
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (03): : 775 - &
[8] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
Shashi Bhushan
Abhay Pratap Pandey
Communications in Mathematics and Statistics, 2023, 11 : 325 - 340
[9] A Robust Approach of Regression-Based Statistical Matching for Continuous Data
Sohn, Sooncheol
Jhun, Myoungshic
KOREAN JOURNAL OF APPLIED STATISTICS, 2012, 25 (02) : 331 - 339
[10] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
Bhushan, Shashi
Pandey, Abhay Pratap
COMMUNICATIONS IN MATHEMATICS AND STATISTICS, 2023, 11 (02) : 325 - 340

← 1 2 3 4 5 →