Methods for imputation of missing values in air quality data sets

被引:412
|
作者
Junninen, H
Niska, H
Tuppurainen, K
Ruuskanen, J
Kolehmainen, M
机构
[1] Univ Kuopio, Dept Environm Sci, FIN-70211 Kuopio, Finland
[2] Commiss European Communities, Inst Environm & Sustainabil, I-21020 Ispra, Italy
[3] Univ Kuopio, Dept Chem, FIN-70211 Kuopio, Finland
关键词
missing data; air quality; multivariate; imputing; neural networks;
D O I
10.1016/j.atmosenv.2004.02.026
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Methods for data imputation applicable to air quality data sets were evaluated in the context of univariate (linear, spline and nearest neighbour interpolation), multivariate (regression-based imputation (REGEM), nearest neighbour (NN), self-organizing map (SOM), multi-layer perceptron (MLP)), and hybrid methods of the previous by using simulated missing data patterns. Additionally, a multiple imputation procedure was considered in order to make comparison between single and multiple imputations schemes. Four statistical criteria were adopted: the index of agreement, the squared correlation coefficient (R 2), the root mean square error and the mean absolute error with bootstrapped standard errors. The results showed that the performance of interpolation in respect to the length of gaps could be estimated separately for each variable of air quality by calculating a gradient and an exponent alpha (Hurst exponent). This can be further utilised in hybrid approach in which the imputation has been performed either by interpolation or multivariate method depending on the length of gaps and variable under study. Among the multivariate methods, SOM and MLP performed slightly better than REGEM and NN methods. The advantage of SOM over the others was that it was less dependent on the actual location of the missing values. If priority is given to computational speed, however, NN can be recommended. The results in general showed that the slight improvement in the performances of multivariate methods can be achieved by using the hybridisation and more substantial one by using the multiple imputations where a final estimate is composed of the outputs of several multivariate fill-in methods. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2895 / 2907
页数:13
相关论文
共 50 条
  • [21] A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets
    Gomez-Carracedo, M. P.
    Andrade, J. M.
    Lopez-Mahia, P.
    Muniategui, S.
    Prada, D.
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 134 : 23 - 33
  • [22] Imputation of continuous missing values in profile data
    Yang, Luo
    Wang, Kaibo
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2022, 38 (07) : 3644 - 3662
  • [23] Data variability in the imputation quality of missing data
    Stochero, Elisandra Lucia Moro
    Lucio, Alessandro Dal'Col
    Jacobi, Luciane Flores
    ACTA SCIENTIARUM-AGRONOMY, 2024, 46
  • [24] Experimental analysis of methods for imputation of missing values in databases
    Farhangfar, A
    Kurgan, L
    Pedrycz, W
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS II, 2004, 5421 : 172 - 182
  • [25] A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets
    Dabke, Kruttika
    Kreimer, Simion
    Jones, Michelle R.
    Parker, Sarah J.
    JOURNAL OF PROTEOME RESEARCH, 2021, 20 (06) : 3214 - 3229
  • [26] Chemometric treatment of missing elements in air quality data sets
    Smolinski, A.
    Hlawiczka, S.
    POLISH JOURNAL OF ENVIRONMENTAL STUDIES, 2007, 16 (04): : 613 - 622
  • [27] Neural Models for Imputation of Missing Ozone Data in Air-Quality Datasets
    Arroyo, Angel
    Herrero, Alvaro
    Tricio, Veronica
    Corchado, Emilio
    Wozniak, Michal
    COMPLEXITY, 2018,
  • [28] Missing values in monotone data sets
    Popova, Viara
    ISDA 2006: Sixth International Conference on Intelligent Systems Design and Applications, Vol 1, 2006, : 627 - 632
  • [29] Robust imputation method for missing values in microarray data
    Yoon, Dankyu
    Lee, Eun-Kyung
    Park, Taesung
    BMC BIOINFORMATICS, 2007, 8 (Suppl 2)
  • [30] Treatment of missing values with imputation for the analysis of otologic data
    Laurikkala, J
    Kentala, E
    Juhola, M
    Pyykkö, I
    MEDICAL INFORMATICS EUROPE '99, 1999, 68 : 428 - 431