Methods for interpolating missing data in aerobiological databases

被引:24
|
作者
Picornell, A. [1 ]
Oteros, J. [2 ,3 ]
Ruiz-Mata, R. [1 ]
Recio, M. [1 ]
Trigo, M. M. [1 ]
Martinez-Bracero, M. [2 ,3 ,4 ]
Lara, B. [5 ]
Serrano-Garcia, A. [5 ]
Galan, C. [2 ,3 ]
Garcia-Mozo, H. [2 ,3 ]
Alcazar, P. [2 ,3 ]
Perez-Badia, R. [5 ]
Cabezudo, B. [1 ]
Romero-Morte, J. [5 ]
Rojo, J. [5 ,6 ]
机构
[1] Univ Malaga, Dept Bot & Plant Physiol, Campus Teatinos S-N, E-29071 Malaga, Spain
[2] Univ Cordoba, Dept Bot Ecol & Plant Physiol, Agrifood Campus Int Excellence CeiA3, Cordoba, Spain
[3] Univ Cordoba, Andalusian Interuniv Inst Earth Syst IISTA, Cordoba, Spain
[4] Technol Univ Dublin, Sch Chem & Pharmaceut Sci, Dublin, Ireland
[5] Univ Castilla La Mancha, Inst Environm Sci Bot, Toledo, Spain
[6] Univ Complutense Madrid, Dept Pharmacol Pharmacognosy & Bot, Madrid, Spain
关键词
Missing data; Aerobiology; Time-series; Modelling; Interpolation; Environmental sampling; Bioaerosols; POACEAE POLLEN SEASON; AIRBORNE POLLEN; ALLERGENIC POLLEN; NATURAL PARK; IMPUTATION; START; PEAK; AIR; IDENTIFICATION; REQUIREMENTS;
D O I
10.1016/j.envres.2021.111391
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Missing data is a common problem in scientific research. The availability of extensive environmental time series is usually laborious and difficult, and sometimes unexpected failures are not detected until samples are processed. Consequently, environmental databases frequently have some gaps with missing data in it. Applying an interpolation method before starting the data analysis can be a good solution in order to complete this missing information. Nevertheless, there are several different approaches whose accuracy should be considered and compared. In this study, data from 6 aerobiological sampling stations were used as an example of environmental data series to assess the accuracy of different interpolation methods. For that, observed daily pollen/spore concentration data series were randomly removed, interpolated by using different methods and then, compared with the observed data to measure the errors produced. Different periods, gap sizes, interpolation methods and bioaerosols were considered in order to check their influence in the interpolation accuracy. The moving mean interpolation method obtained the highest success rate as average. By using this method, a success rate of the 70% was obtained when the risk classes used in the alert systems of the pollen information platforms were taken into account. In general, errors were mostly greater when there were high oscillations in the concentrations of biotic particles during consecutive days. That is the reason why the pre-peak and peak periods showed the highest interpolation errors. The errors were also higher when gaps longer than 5 days were considered. So, for completing long periods of missing data, it would be advisable to test other methodological approaches. A new Variation Index based on the behaviour of the pollen/spore season (measurement of the variability of the concentrations every 2 consecutive days) was elaborated, which allows to estimate the potential error before the interpolation is applied.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Intelligent methods for data retrieval in fusion databases
    Vega, J.
    FUSION ENGINEERING AND DESIGN, 2008, 83 (2-3) : 382 - 386
  • [42] Query execution strategies for missing data in distributed heterogeneous object databases
    Koh, JL
    Chen, ALP
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 1996, : 466 - 473
  • [43] Missing data in breast cancer: Relationship with survival in national databases.
    Plichta, Jennifer Kay
    Rushing, Christel N.
    Lewis, Holly C.
    Blazer, Dan G.
    Hyslop, Terry
    Greenup, Rachel Adams
    JOURNAL OF CLINICAL ONCOLOGY, 2020, 38 (15)
  • [44] Discovering patterns of missing data in survey databases: An application of rough sets
    Wang, Hai
    Wang, Shouhong
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6256 - 6260
  • [45] Phenological records as a complement to aerobiological data
    Tormo, Rafael
    Silva, Inmaculada
    Gonzalo, Angela
    Moreno, Alfonsa
    Perez, Remedios
    Fernandez, Santiago
    INTERNATIONAL JOURNAL OF BIOMETEOROLOGY, 2011, 55 (01) : 51 - 65
  • [46] Phenological records as a complement to aerobiological data
    Rafael Tormo
    Inmaculada Silva
    Ángela Gonzalo
    Alfonsa Moreno
    Remedios Pérez
    Santiago Fernández
    International Journal of Biometeorology, 2011, 55 : 51 - 65
  • [47] Hybrid kriging methods for interpolating sparse river bathymetry point data
    Gomes Batista, Pedro Velloso
    Naves Silva, Marx Leandro
    Pomar Avalos, Fabio Arnaldo
    de Oliveira, Marcelo Silva
    de Menezes, Michele Duarte
    Curi, Nilton
    CIENCIA E AGROTECNOLOGIA, 2017, 41 (04): : 402 - 412
  • [48] Using missing data methods in genetic studies with missing mutation status
    Leong, T
    Lipsitz, SR
    Ibrahim, JG
    STATISTICS IN MEDICINE, 1999, 18 (04) : 473 - 485
  • [49] A Decision Tree Approach for Spatially Interpolating Missing Land Cover Data and Classifying Satellite Images
    Holloway, Jacinta
    Helmstedt, Kate J.
    Mengersen, Kerrie
    Schmidt, Michael
    REMOTE SENSING, 2019, 11 (15)
  • [50] THE USE OF AEROBIOLOGICAL DATA ON AGRONOMICAL STUDIES
    Garcia-Mozo, Herminia
    ANNALS OF AGRICULTURAL AND ENVIRONMENTAL MEDICINE, 2011, 18 (01) : 1 - 6