A new approach for data editing and imputation

被引:0
|
作者
Sergio Delgado-Quintero
Juan-José Salazar-González
机构
[1] Universidad de La Laguna,DEIOC
关键词
Editing; Imputation; Error localization problem; Mathematical Programming; Heuristics;
D O I
暂无
中图分类号
学科分类号
摘要
The editing-and-imputation problem concerns the question of finding errors in a record which does not satisfy a set of consistency rules. Once some potential errors have been localizated, it is also necessary to impute new values to the associated fields. The output dataset should consist of valid records and preserve similar statistical properties as the input dataset. Most of this work is usually done manually by statistical agencies, thus consuming a great deal of human resources. This paper presents a mathematical programming model to optimally solve the problem on surveys with categorical values and particular edits. We also describe a heuristic approach to deal with the more complex surveys. The heuristic procedure follows a combination of the widely-accepted hot-deck donor scheme and the multivariate regression analysis. It has been implemented in a graphical user interface running on standard personal computers, and has been tested on real-world surveys. This paper demonstrates the satisfactory performance of our automatic procedure.
引用
收藏
相关论文
共 50 条
  • [21] A nonparametric multiple imputation approach for missing categorical data
    Zhou, Muhan
    He, Yulei
    Yu, Mandi
    Hsu, Chiu-Hsieh
    BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [22] Tree-based Approach to Missing Data Imputation
    Vateekul, Peerapon
    Sarinnapakorn, Kanoksri
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 70 - +
  • [23] A New Multispectral Data Augmentation Technique Based on Data Imputation
    Accion, Alvaro
    Argueello, Francisco
    Heras, Dora B.
    REMOTE SENSING, 2021, 13 (23)
  • [24] A functional multiple imputation approach to incomplete longitudinal data
    He, Yulei
    Yucel, Recai
    Raghunathan, Trivellore E.
    STATISTICS IN MEDICINE, 2011, 30 (10) : 1137 - 1156
  • [25] Multiple imputation: a mature approach to dealing with missing data
    S. Chevret
    S. Seaman
    M. Resche-Rigon
    Intensive Care Medicine, 2015, 41 : 348 - 350
  • [26] A nonparametric multiple imputation approach for missing categorical data
    Muhan Zhou
    Yulei He
    Mandi Yu
    Chiu-Hsieh Hsu
    BMC Medical Research Methodology, 17
  • [27] Energetic Map Data Imputation: A Machine Learning Approach
    Straub, Tobias
    Nagy, Madalina Mandy
    Sidorov, Maxim
    Tonetto, Leonardo
    Frey, Michael
    Gauterin, Frank
    ENERGIES, 2020, 13 (04)
  • [28] Missing Categorical Data Imputation Approach Based on Similarity
    Wu, Sen
    Feng, Xiaodong
    Han, Yushan
    Wang, Qiang
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2827 - 2832
  • [29] A First Approach on Big Data Missing Values Imputation
    Montesdeoca, Besay
    Luengo, Julian
    Maillo, Jesus
    Garcia-Gil, Diego
    Garcia, Salvador
    Herrera, Francisco
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS 2019), 2019, : 315 - 323
  • [30] A new approach for disclosure control in the IAB establishment panel—multiple imputation for a better data access
    Jörg Drechsler
    Agnes Dundler
    Stefan Bender
    Susanne Rässler
    Thomas Zwick
    AStA Advances in Statistical Analysis, 2008, 92 : 439 - 458