QR prediction for statistical data integration

被引:0
|
作者
Medous, Estelle [1 ,2 ]
Goga, Camelia [3 ]
Ruiz-Gazen, Anne [4 ]
Beaumont, Jean-Francois [5 ]
Dessertaine, Alain [6 ]
Puech, Pauline [6 ]
机构
[1] Univ Toulouse Capitole, Toulouse Sch Econ, 1 Esplanade Univ, F-31000 Toulouse, France
[2] Univ Franche Comte, Lab Math Besancon & Poste, 3 Rue Jean Richepin, F-93192 Noisy Le Grand, France
[3] Univ Franche Comte, Lab Math Besancon, Besancon, France
[4] Univ Toulouse Capitole, Toulouse Sch Econ, 1 Esplanade Univ, F-31000 Toulouse, France
[5] Stat Canada, 100 Tunneys Pasture Driveway, Ottawa, ON, Canada
[6] La Poste, 3 Rue Jean Richepin, F-44038 Noisy Le Grand, France
关键词
Cosmetic estimator; Dual frame; GREG estimator; Non-probability sample; Probability sample; Variance estimator; ESTIMATORS;
D O I
暂无
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
In this paper, we investigate how a big non -probability database can be used to improve estimates of finite population totals from a small probability sample through data integration techniques. In the situation where the study variable is observed in both data sources, Kim and Tam (2021) proposed two design -consistent estimators that can be justified through dual frame survey theory. First, we provide conditions ensuring that these estimators are more efficient than the Horvitz -Thompson estimator when the probability sample is selected using either Poisson sampling or simple random sampling without replacement. Then, we study the class of QR predictors, introduced by Sarndal and Wright (1984), to handle the less common case where the non -probability database contains no study variable but auxiliary variables. We also require that the non -probability database is large and can be linked to the probability sample. We provide conditions ensuring that the QR predictor is asymptotically design -unbiased. We derive its asymptotic design variance and provide a consistent design -based variance estimator. We compare the design properties of different predictors, in the class of QR predictors, through a simulation study. This class includes a model -based predictor, a model -assisted estimator and a cosmetic estimator. In our simulation setups, the cosmetic estimator performed slightly better than the model -assisted estimator. These findings are confirmed by an application to La Poste data, which also illustrates that the properties of the cosmetic estimator are preserved irrespective of the observed non -probability sample.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] A statistical framework of data fusion for spatial prediction of categorical variables
    Cao, Guofeng
    Yoo, Eun-hye
    Wang, Shaowen
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2014, 28 (07) : 1785 - 1799
  • [42] The elements of statistical learning: Data mining, inference, and prediction.
    Ramsay, J
    PSYCHOMETRIKA, 2003, 68 (04) : 611 - 612
  • [43] Statistical Analysis and Data Mining Combined Yoga Grade Prediction
    Yu, Lan
    ICEEM 2012: 2012 2ND INTERNATIONAL CONFERENCE ON ECONOMIC, EDUCATION AND MANAGEMENT, VOL 1, 2012, : 657 - 660
  • [44] The construction and assessment of a statistical model for the prediction of protein assay data
    Pittman, J
    Sacks, J
    Young, SS
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (03): : 729 - 741
  • [45] Econometric and statistical data mining, prediction and policy-making
    Zellner, A
    STATISTICAL DATA MINING AND KNOWLEDGE DISCOVERY, 2004, : 57 - 78
  • [46] Statistical analysis and reliability prediction with short fatigue crack data
    Wilson, SP
    Taylor, D
    FATIGUE & FRACTURE OF ENGINEERING MATERIALS & STRUCTURES, 1999, 22 (01) : 67 - 76
  • [47] A statistical framework of data fusion for spatial prediction of categorical variables
    Guofeng Cao
    Eun-hye Yoo
    Shaowen Wang
    Stochastic Environmental Research and Risk Assessment, 2014, 28 : 1785 - 1799
  • [48] Bioimpedance data statistical modelling for food quality classification and prediction
    Rivola, Maria
    Ibba, Pietro
    Lugli, Paolo
    Petti, Luisa
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [49] The elements of statistical learning: Data mining, inference and prediction.
    Marcoulides, GA
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2004, 11 (01) : 150 - 151
  • [50] QR Coded Field Data Acquisition
    Higashida, Mitsuhiro
    Matsushita, Yasushi
    Hayashi, Haruo
    Miyake, Kouichi
    Morikawa, Masayuki
    Yoshitomi, Nozomu
    JOURNAL OF DISASTER RESEARCH, 2010, 5 (01) : 66 - 73