Evaluating traditional versus ensemble machine learning methods for predicting missing data of daily PM10 concentration

被引:4
|
作者
Kalantari, Elham [1 ]
Gholami, Hamid [1 ]
Malakooti, Hossein [2 ]
Eftekhari, Mahdi [3 ]
Saneei, Poorya [3 ]
Esfandiarpour, Donya [3 ]
Moosavi, Vahid [4 ]
Nafarzadegan, Ali Reza [1 ]
机构
[1] Univ Hormozgan, Dept Nat Resources Engn, Bandar Abbas, Hormozgan, Iran
[2] Univ Hormozgan, Fac Marine Sci & Technol, Dept Marine & Atmospher Sci Non Biol, Bandar Abbas, Iran
[3] Shahid Bahonar Univ Kerman, Dept Comp Engn, Kerman, Iran
[4] Tarbiat Modares Univ, Dept Watershed Management Engn, Noor, Mazandaran, Iran
关键词
Machine learning; PM; 10; prediction; XGBoost; Time series; Zabol; ARTIFICIAL NEURAL-NETWORKS; AIR; INTERPOLATION; EMISSIONS;
D O I
10.1016/j.apr.2024.102063
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of this study was to predict the missing data of PM10 for the city of Zabol using various traditional learning methods, Lazy Learning, and Ensemble Learning. In this study, daily minimum, average, and maximum data of weather variables were collected, along with daily PM10 concentration from the Zabol airport weather station during the years 2013-2022. To compare the performance of the predictive models, R2, mean absolute error (MAE), and mean squared error (MSE) criteria were used. The reconstruction results show that collective learning models, especially XGBoost, can be effectively used to predict missing PM10 data in time series. Additionally, among ensemble learning methods, boosting algorithms provide higher accuracy in predicting missing PM10 data than packing algorithms. It was also found that, according to the results, among the traditional learning methods, lazy learning models performed better than eager learning models. In order of efficiency and accuracy for predicting PM10 missing data, the models include XGBoost, random forest (RF), Extra Trees (ET), Light gradient boosting machine (GBM), The Decision Tree regressor with the Bagging method, gradient boosting (GB), Ada Boost, Weighted K-Nearest Neighbor (WKNN), K-Nearest Neighbor (KNN), The Decision Tree Regressor with the Pasting method, artificial neural network (ANN), Decision Tree (DT), and linear regression (LR). In general, given the high processing capability and potential of collective learning methods in the field of predicting missing PM10 data, this technique is considered a useful solution for saving time, energy, and costs of collecting and measuring data. It can also replace missing data in the case of any equipment malfunction or damage. This approach can also be used to predict pollutant concentrations in weather systems.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Usage of output-dependent data scaling in modeling and prediction of air pollution daily concentration values (PM10) in the city of Konya
    Polat, Kemal
    Durduran, S. Savas
    NEURAL COMPUTING & APPLICATIONS, 2012, 21 (08): : 2153 - 2162
  • [32] Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods
    Yazdi, Mahdieh Danesh
    Kuang, Zheng
    Dimakopoulou, Konstantina
    Barratt, Benjamin
    Suel, Esra
    Amini, Heresh
    Lyapustin, Alexei
    Katsouyanni, Klea
    Schwartz, Joel
    REMOTE SENSING, 2020, 12 (06)
  • [33] Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Aviles (Northern Spain) using machine learning techniques
    Garcia Nieto, P. J.
    Sanchez Lasheras, F.
    Garcia-Gonzalo, E.
    de Cos Juez, F. J.
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2018, 32 (11) : 3287 - 3298
  • [34] Prediction of Daily PM10 Concentration for Air Korea Stations Using Artificial Intelligence with LDAPS Weather Data, MODIS AOD, and Chinese Air Quality Data
    Jeong, Yemin
    Youn, Youjeong
    Cho, Subin
    Kim, Seoyeon
    Huh, Morang
    Lee, Yangwon
    KOREAN JOURNAL OF REMOTE SENSING, 2020, 36 (04) : 573 - 586
  • [35] An intercomparison of weather normalization of PM2.5 concentration using traditional statistical methods, machine learning, and chemistry transport models
    Zheng, Huang
    Kong, Shaofei
    Zhai, Shixian
    Sun, Xiaoyun
    Cheng, Yi
    Yao, Liquan
    Song, Congbo
    Zheng, Zhonghua
    Shi, Zongbo
    Harrison, Roy M.
    NPJ CLIMATE AND ATMOSPHERIC SCIENCE, 2023, 6 (01)
  • [36] An intercomparison of weather normalization of PM2.5 concentration using traditional statistical methods, machine learning, and chemistry transport models
    Huang Zheng
    Shaofei Kong
    Shixian Zhai
    Xiaoyun Sun
    Yi Cheng
    Liquan Yao
    Congbo Song
    Zhonghua Zheng
    Zongbo Shi
    Roy M. Harrison
    npj Climate and Atmospheric Science, 6
  • [37] Spatiotemporal modeling of PM10 via committee method with in-situ and large scale information: Coupling of machine learning and statistical methods
    Mohammadi, Yasaman
    Zandi, Omid
    Nasseri, Mohsen
    Rashidi, Yousef
    URBAN CLIMATE, 2023, 49
  • [38] Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques
    P. J. García Nieto
    F. Sánchez Lasheras
    E. García-Gonzalo
    F. J. de Cos Juez
    Stochastic Environmental Research and Risk Assessment, 2018, 32 : 3287 - 3298
  • [39] A practical framework for predicting residential indoor PM2.5 concentration using land-use regression and machine learning methods
    Li, Zhiyuan
    Tong, Xinning
    Ho, Jason Man Wai
    Kwok, Timothy C. Y.
    Dong, Guanghui
    Ho, Kin-Fai
    Yim, Steve Hung Lam
    CHEMOSPHERE, 2021, 265 (265)
  • [40] Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London
    Analitis, Antonis
    Barratt, Benjamin
    Green, David
    Beddows, Andrew
    Samoli, Evangelia
    Schwartz, Joel
    Katsouyanni, Klea
    ATMOSPHERIC ENVIRONMENT, 2020, 240