Evaluating traditional versus ensemble machine learning methods for predicting missing data of daily PM10 concentration

被引:4
|
作者
Kalantari, Elham [1 ]
Gholami, Hamid [1 ]
Malakooti, Hossein [2 ]
Eftekhari, Mahdi [3 ]
Saneei, Poorya [3 ]
Esfandiarpour, Donya [3 ]
Moosavi, Vahid [4 ]
Nafarzadegan, Ali Reza [1 ]
机构
[1] Univ Hormozgan, Dept Nat Resources Engn, Bandar Abbas, Hormozgan, Iran
[2] Univ Hormozgan, Fac Marine Sci & Technol, Dept Marine & Atmospher Sci Non Biol, Bandar Abbas, Iran
[3] Shahid Bahonar Univ Kerman, Dept Comp Engn, Kerman, Iran
[4] Tarbiat Modares Univ, Dept Watershed Management Engn, Noor, Mazandaran, Iran
关键词
Machine learning; PM; 10; prediction; XGBoost; Time series; Zabol; ARTIFICIAL NEURAL-NETWORKS; AIR; INTERPOLATION; EMISSIONS;
D O I
10.1016/j.apr.2024.102063
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of this study was to predict the missing data of PM10 for the city of Zabol using various traditional learning methods, Lazy Learning, and Ensemble Learning. In this study, daily minimum, average, and maximum data of weather variables were collected, along with daily PM10 concentration from the Zabol airport weather station during the years 2013-2022. To compare the performance of the predictive models, R2, mean absolute error (MAE), and mean squared error (MSE) criteria were used. The reconstruction results show that collective learning models, especially XGBoost, can be effectively used to predict missing PM10 data in time series. Additionally, among ensemble learning methods, boosting algorithms provide higher accuracy in predicting missing PM10 data than packing algorithms. It was also found that, according to the results, among the traditional learning methods, lazy learning models performed better than eager learning models. In order of efficiency and accuracy for predicting PM10 missing data, the models include XGBoost, random forest (RF), Extra Trees (ET), Light gradient boosting machine (GBM), The Decision Tree regressor with the Bagging method, gradient boosting (GB), Ada Boost, Weighted K-Nearest Neighbor (WKNN), K-Nearest Neighbor (KNN), The Decision Tree Regressor with the Pasting method, artificial neural network (ANN), Decision Tree (DT), and linear regression (LR). In general, given the high processing capability and potential of collective learning methods in the field of predicting missing PM10 data, this technique is considered a useful solution for saving time, energy, and costs of collecting and measuring data. It can also replace missing data in the case of any equipment malfunction or damage. This approach can also be used to predict pollutant concentrations in weather systems.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Research and application of a novel hybrid decomposition-ensemble learning paradigm with error correction for daily PM10 forecasting
    Luo, Hongyuan
    Wang, Deyun
    Yue, Chenqiang
    Liu, Yanling
    Guo, Haixiang
    ATMOSPHERIC RESEARCH, 2018, 201 : 34 - 45
  • [22] Evaluating the Performance of Ensemble Machine Learning Algorithms Over Traditional Machine Learning Algorithms for Predicting Fire Resistance in FRP Strengthened Concrete Beams
    Kumarawadu, H. R.
    Weerasinghe, T. G. P. L.
    Perera, Jude Shalitha
    ELECTRONIC JOURNAL OF STRUCTURAL ENGINEERING, 2024, 24 (03): : 46 - 52
  • [23] Comparing Methods to Impute Missing Daily Ground-Level PM10 Concentrations between 2010-2017 in South Africa
    Arowosegbe, Oluwaseyi Olalekan
    Roeoesli, Martin
    Kunzli, Nino
    Saucy, Apolline
    Adebayo-Ojo, Temitope Christina
    Jeebhay, Mohamed F.
    Dalvie, Mohammed Aqiel
    de Hoogh, Kees
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (07)
  • [24] Applying machine learning methods in managing urban concentrations of traffic-related particulate matter (PM10 and PM2.5)
    Suleiman, A.
    Tight, M. R.
    Quinn, A. D.
    ATMOSPHERIC POLLUTION RESEARCH, 2019, 10 (01) : 134 - 144
  • [25] A new optimized hybrid approach combining machine learning with WRF-CHIMERE model for PM10 concentration prediction
    Chelhaoui, Youssef
    El Ass, Khalid
    Lachatre, Mathieu
    Bouakline, Oumaima
    Khomsi, Kenza
    El Moussaoui, Tawfik
    Arrad, Mouad
    Eddaif, Abdelhamid
    Albergel, Armand
    MODELING EARTH SYSTEMS AND ENVIRONMENT, 2024, 10 (04) : 5687 - 5701
  • [26] Machine learning models to quantify the influence of PM10 aerosol concentration on global solar radiation prediction in South Africa
    Govindasamy, Tamara Rosemary
    Chetty, Naven
    CLEANER ENGINEERING AND TECHNOLOGY, 2021, 2
  • [27] A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5 concentration
    Berrocal, Veronica J.
    Guan, Yawen
    Muyskens, Amanda
    Wang, Haoyu
    Reich, Brian J.
    Mulholland, James A.
    Chang, Howard H.
    ATMOSPHERIC ENVIRONMENT, 2020, 222
  • [28] A Novel Hybrid Model Combining the Support Vector Machine (SVM) and Boosted Regression Trees (BRT) Technique in Predicting PM10 Concentration
    Shaziayani, Wan Nur
    Ahmat, Hasfazilah
    Razak, Tajul Rosli
    Zainan Abidin, Aida Wati
    Warris, Saiful Nizam
    Asmat, Arnis
    Noor, Norazian Mohamed
    Ul-Saufie, Ahmad Zia
    ATMOSPHERE, 2022, 13 (12)
  • [29] Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods
    Saeipourdizaj, Parisa
    Sarbakhsh, Parvin
    Gholampour, Akbar
    ENVIRONMENTAL HEALTH ENGINEERING AND MANAGEMENT JOURNAL, 2021, 8 (03): : 215 - 226
  • [30] Usage of output-dependent data scaling in modeling and prediction of air pollution daily concentration values (PM10) in the city of Konya
    Kemal Polat
    S. Savaş Durduran
    Neural Computing and Applications, 2012, 21 : 2153 - 2162