Methods for imputation of missing values in air quality data sets

被引:412
|
作者
Junninen, H
Niska, H
Tuppurainen, K
Ruuskanen, J
Kolehmainen, M
机构
[1] Univ Kuopio, Dept Environm Sci, FIN-70211 Kuopio, Finland
[2] Commiss European Communities, Inst Environm & Sustainabil, I-21020 Ispra, Italy
[3] Univ Kuopio, Dept Chem, FIN-70211 Kuopio, Finland
关键词
missing data; air quality; multivariate; imputing; neural networks;
D O I
10.1016/j.atmosenv.2004.02.026
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Methods for data imputation applicable to air quality data sets were evaluated in the context of univariate (linear, spline and nearest neighbour interpolation), multivariate (regression-based imputation (REGEM), nearest neighbour (NN), self-organizing map (SOM), multi-layer perceptron (MLP)), and hybrid methods of the previous by using simulated missing data patterns. Additionally, a multiple imputation procedure was considered in order to make comparison between single and multiple imputations schemes. Four statistical criteria were adopted: the index of agreement, the squared correlation coefficient (R 2), the root mean square error and the mean absolute error with bootstrapped standard errors. The results showed that the performance of interpolation in respect to the length of gaps could be estimated separately for each variable of air quality by calculating a gradient and an exponent alpha (Hurst exponent). This can be further utilised in hybrid approach in which the imputation has been performed either by interpolation or multivariate method depending on the length of gaps and variable under study. Among the multivariate methods, SOM and MLP performed slightly better than REGEM and NN methods. The advantage of SOM over the others was that it was less dependent on the actual location of the missing values. If priority is given to computational speed, however, NN can be recommended. The results in general showed that the slight improvement in the performances of multivariate methods can be achieved by using the hybridisation and more substantial one by using the multiple imputations where a final estimate is composed of the outputs of several multivariate fill-in methods. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2895 / 2907
页数:13
相关论文
共 50 条
  • [1] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [2] Spectral methods for imputation of missing air quality data
    Shai Moshenberg
    Uri Lerner
    Barak Fishbain
    Environmental Systems Research, 4 (1)
  • [3] Comparison of Imputation Methods for Missing Values in Air Pollution Data: Case Study on Sydney Air Quality Index
    Wijesekara, W. M. L. K. N.
    Liyanage, Liwan
    ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 257 - 269
  • [4] REGRESSION IMPUTATION OF MISSING VALUES IN LONGITUDINAL DATA SETS
    SCHNEIDERMAN, ED
    KOWALSKI, CJ
    WILLIS, SM
    INTERNATIONAL JOURNAL OF BIO-MEDICAL COMPUTING, 1993, 32 (02): : 121 - 133
  • [5] Proper Imputation Techniques for Missing Values in Data sets
    Aljuaid, Tahani
    Sasi, Sreela
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON DATA SCIENCE & ENGINEERING (ICDSE), 2016, : 146 - 150
  • [6] Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets
    Sim, Jaemun
    Kwon, Ohbyung
    Lee, Kun Chang
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 46 : 485 - 493
  • [7] Cyclical hybrid imputation technique for missing values in data sets
    Kotan, Kurban
    Kirisoglu, Serdar
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [8] Single imputation method of missing values in environmental pollution data sets
    Plaia, A.
    Bondi, A. L.
    ATMOSPHERIC ENVIRONMENT, 2006, 40 (38) : 7316 - 7330
  • [9] Optimization methods for the imputation of missing values in Educational Institutions Data
    Aureli, D.
    Bruni, R.
    Daraio, C.
    METHODSX, 2021, 8
  • [10] Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation
    J. A. Martín-Fernández
    C. Barceló-Vidal
    V. Pawlowsky-Glahn
    Mathematical Geology, 2003, 35 : 253 - 278