ACCURACY IN THE APPLICATION OF STATISTICAL MATCHING METHODS FOR CONTINUOUS VARIABLES USING AUXILIARY DATA

被引:2
|
作者
Van Delden, Arnout [1 ]
Du Chatinier, Bart J. [1 ]
Scholtus, Sander [1 ]
机构
[1] Stat Netherlands CBS, The Hague, Netherlands
关键词
Administrative data; Data integration; EM algorithm; Integration of surveys; Official statistics; UNCERTAINTY;
D O I
10.1093/jssam/smz032
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Statistical matching is a technique to combine variables in two or more nonoverlapping samples that are drawn from the same population. In the current study, the unobserved joint distribution between two target variables in nonoverlapping samples is estimated using a parametric model. A classical assumption to estimate this joint distribution is that the target variables are independent given the background variables observed in both samples. A problem with the use of this conditional independence assumption is that the estimated joint distribution may be severely biased when the assumption does not hold, which in general will be unacceptable for official statistics. Here, we explored to what extent the accuracy can be improved by the use of two types of auxiliary information: the use of a common administrative variable and the use of a small additional sample from a similar population. This additional sample is included by using the partial correlation of the target variables given the background variables or by using an EM algorithm. In total, four different approaches were compared to estimate the joint distribution of the target variables. Starting with empirical data, we show how the accuracy of the joint distribution is affected by the use of administrative data and by the size of the additional sample included via a partial correlation and through an EM algorithm. The study further shows how this accuracy depends on the strength of the relations among the target and auxiliary variables. We found that including a common administrative variable does not always improve the accuracy of the results. We further found that the EM algorithm nearly always yielded the most accurate results; this effect is largest when the explained variance of the separate target variables by the common background variables is not large.
引用
收藏
页码:990 / 1017
页数:28
相关论文
共 50 条
  • [31] Understanding Common Statistical Methods, Part I: Descriptive Methods, Probability, and Continuous Data
    Skinner, Carl G.
    Patel, Manish M.
    Thomas, Jerry D.
    Miller, Michael A.
    MILITARY MEDICINE, 2011, 176 (01) : 99 - 102
  • [32] Reliability analysis using experimental statistical methods and AIS: application in continuous flow tubes of gaseous medium
    Outa, Roberto
    Chavarette, Fabio Roberto
    Goncalves, Aparecido Carlos
    da Silva, Sidney Leal
    Mishra, Vishnu Narayan
    Panosso, Alan Rodrigo
    Mishra, Lakshmi Narayan
    ACTA SCIENTIARUM-TECHNOLOGY, 2021, 43
  • [33] STATISTICAL MATCHING AND SUBCLASSIFICATION WITH A CONTINUOUS DOSE: CHARACTERIZATION, ALGORITHM, AND APPLICATION TO A HEALTH OUTCOMES STUDY
    Zhang, B. O.
    Mackay, Emily J.
    Baiocchi, M. I. K. E.
    ANNALS OF APPLIED STATISTICS, 2023, 17 (01): : 454 - 475
  • [34] Analysis of Neurosurgery Data Using Statistical and Data Mining Methods
    Berka, Petr
    Jablonsky, Josef
    Marek, Lubos
    Vrabec, Michal
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS, MICAI 2015, PT II, 2015, 9414 : 310 - 321
  • [35] Fault detection in continuous processes using multivariate statistical methods
    Goulding, PR
    Lennox, B
    Sandoz, DJ
    Smith, KJ
    Marjanovic, O
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2000, 31 (11) : 1459 - 1471
  • [36] Bootstrap Methods for Statistical Inference. Part I: Comparative Forecast Verification for Continuous Variables
    Gilleland, Eric
    JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY, 2020, 37 (11) : 2117 - 2134
  • [37] A Systematic Review of Statistical Methods Used to Test for Reliability of Medical Instruments Measuring Continuous Variables
    Zaki, Rafdzah
    Bulgiba, Awang
    Nordin, Noorhaire
    Ismail, Noor Azina
    IRANIAN JOURNAL OF BASIC MEDICAL SCIENCES, 2013, 16 (06) : 803 - 807
  • [38] Estimation of missing data using latent variable methods with auxiliary information
    Muteki, K
    MacGregor, JF
    Ueda, T
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 78 (1-2) : 41 - 50
  • [39] Application of Statistical Methods for Analysis of Agricultural Runoff Monitoring Data
    Lagzdins, Ainis
    Jansons, Viesturs
    ENVIRONMENTAL AND CLIMATE TECHNOLOGIES, 2010, 5 (01) : 65 - 71
  • [40] AN APPLICATION OF STATISTICAL METHODS TO CORE ANALYSIS DATA OF DOLOMITIC LIMESTONE
    BULNES, AC
    TRANSACTIONS OF THE AMERICAN INSTITUTE OF MINING AND METALLURGICAL ENGINEERS, 1946, 165 : 223 - 240