Methods for imputation of missing values in air quality data sets

被引:413
|
作者
Junninen, H
Niska, H
Tuppurainen, K
Ruuskanen, J
Kolehmainen, M
机构
[1] Univ Kuopio, Dept Environm Sci, FIN-70211 Kuopio, Finland
[2] Commiss European Communities, Inst Environm & Sustainabil, I-21020 Ispra, Italy
[3] Univ Kuopio, Dept Chem, FIN-70211 Kuopio, Finland
关键词
missing data; air quality; multivariate; imputing; neural networks;
D O I
10.1016/j.atmosenv.2004.02.026
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Methods for data imputation applicable to air quality data sets were evaluated in the context of univariate (linear, spline and nearest neighbour interpolation), multivariate (regression-based imputation (REGEM), nearest neighbour (NN), self-organizing map (SOM), multi-layer perceptron (MLP)), and hybrid methods of the previous by using simulated missing data patterns. Additionally, a multiple imputation procedure was considered in order to make comparison between single and multiple imputations schemes. Four statistical criteria were adopted: the index of agreement, the squared correlation coefficient (R 2), the root mean square error and the mean absolute error with bootstrapped standard errors. The results showed that the performance of interpolation in respect to the length of gaps could be estimated separately for each variable of air quality by calculating a gradient and an exponent alpha (Hurst exponent). This can be further utilised in hybrid approach in which the imputation has been performed either by interpolation or multivariate method depending on the length of gaps and variable under study. Among the multivariate methods, SOM and MLP performed slightly better than REGEM and NN methods. The advantage of SOM over the others was that it was less dependent on the actual location of the missing values. If priority is given to computational speed, however, NN can be recommended. The results in general showed that the slight improvement in the performances of multivariate methods can be achieved by using the hybridisation and more substantial one by using the multiple imputations where a final estimate is composed of the outputs of several multivariate fill-in methods. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2895 / 2907
页数:13
相关论文
共 50 条
  • [41] Imputation methods for missing data for polygenic models
    Fridley, B
    Rabe, K
    de Andrade, M
    BMC GENETICS, 2003, 4 (Suppl 1)
  • [42] A context-intensive approach to imputation of missing values in data sets from networks of environmental monitors
    Larsen, Lawrence C.
    Shah, Mena
    JOURNAL OF THE AIR & WASTE MANAGEMENT ASSOCIATION, 2016, 66 (01) : 38 - 52
  • [43] A Method of Pruning and Random Replacing of Known Values for Comparing Missing Data Imputation Models for Incomplete Air Quality Time Series
    Menendez Garcia, Luis Alfonso
    Fernandez, Marta Menendez
    Sokola-Szewiola, Violetta
    de Prado, Laura Alvarez
    Marques, Almudena Ortiz
    Lopez, David Fernandez
    Sanchez, Antonio Bernardo
    APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [44] Imputation methods for addressing missing data in short-term monitoring of air pollutants
    Hadeed, Steven J.
    O'Rourke, Mary Kay
    Burgess, Jefferey L.
    Harris, Robin B.
    Canales, Robert A.
    SCIENCE OF THE TOTAL ENVIRONMENT, 2020, 730 (730)
  • [45] OVERCOMING MISSING VALUES USING IMPUTATION METHODS IN THE CLASSIFICATION OF TUBERCULOSIS
    Rochman, Eka Mala Sari
    Miswanto
    Suprajitno, Herry
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2022,
  • [46] Microarray Missing Values Imputation Methods: Critical Analysis Review
    Hourani, Mou'ath
    El Emary, Ibrahiem M. M.
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2009, 6 (02) : 165 - 190
  • [47] Advanced methods for missing values imputation based on similarity learning
    Fouad, Khaled M.
    Ismail, Mahmoud M.
    Azar, Ahmad Taher
    Arafa, Mona M.
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [48] The Effects of Methods of Imputation for Missing Values on the Validity and Reliability of Scales
    Cokluk, Omay
    Kayri, Murat
    KURAM VE UYGULAMADA EGITIM BILIMLERI, 2011, 11 (01): : 303 - 309
  • [49] Advanced methods for missing values imputation based on similarity learning
    Fouad K.M.
    Ismail M.M.
    Azar A.T.
    Arafa M.M.
    Ismail, Mahmoud M. (mahmoud.ismael@fci.bu.edu.eg), 1600, PeerJ Inc. (07): : 1 - 38
  • [50] Federated conditional generative adversarial nets imputation method for air quality missing data
    Zhou, Xu
    Liu, Xiaofeng
    Lan, Gongjin
    Wu, Jian
    KNOWLEDGE-BASED SYSTEMS, 2021, 228