Application of machine learning and deep learning methods for hydrated electron rate constant prediction

被引:8
|
作者
Zheng, Shanshan [1 ]
Guo, Wanqian [1 ]
Li, Chao [2 ]
Sun, Yongbin [3 ]
Zhao, Qi [1 ]
Lu, Hao [1 ]
Si, Qishi [1 ]
Wang, Huazhe [1 ]
机构
[1] Harbin Inst Technol, State Key Lab Urban Water Resource & Environm, Harbin 150090, Peoples R China
[2] Northeast Normal Univ, Sch Environm, State Environm Protect Key Lab Wetland Ecol & Vege, 2555 Jingyue St, Changchun 130117, Jilin, Peoples R China
[3] Shandong First Med Univ & Shandong Acad Med Sci, Sch Chem & Pharmaceut Engn, Tai An 271016, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Rate constant prediction; Machine learning; Deep learning; Hydrated electron(e(aq)(-) ); SHAP; Grad-CAM; REDUCTIVE DEFLUORINATION; PERFLUOROOCTANOIC ACID; MODELS; DEGRADATION;
D O I
10.1016/j.envres.2023.115996
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Accurately determining the second-order rate constant with e(aq)(-) (k(eaq-)) for organic compounds (OCs) is crucial in the e(aq)(-) induced advanced reduction processes (ARPs). In this study, we collected 867 k(eaq-) values at different pHs from peer-reviewed publications and applied machine learning (ML) algorithm-XGBoost and deep learning (DL) algorithm-convolutional neural network (CNN) to predict k(eaq-). Our results demonstrated that the CNN model with transfer learning and data augmentation (CNN-TL&DA) greatly improved the prediction results and over-came over-fitting. Furthermore, we compared the ML/DL modeling methods and found that the CNN-TL&DA, which combined molecular images (MI), achieved the best overall performance (R-test(2) = 0.896, RMSEtest = 0.362, MAE(test) = 0.261) when compared to the XGBoost algorithm combined with Mordred descriptors (MD) (0.692, RMSEtest = 0.622, MAE(test) = 0.399) and Morgan fingerprint (MF) (R-test(2) = 0.512, RMSEtest = 0.783, MAE(test )= 0.520). Moreover, the interpretation of the MD-XGBoost and MF-XGBoost models using the SHAP method revealed the significance of MDs (e.g., molecular size, branching, electron distribution, polarizability, and bond types), MFs (e.g, aromatic carbon, carbonyl oxygen, nitrogen, and halogen) and environmental conditions (e.g., pH) that effectively influence the k(eaq-) prediction. The interpretation of the 2D molecular image-CNN (MI-CNN) models using the Grad-CAM method showed that they correctly identified key functional groups such as -CN, -NO2, and -X functional groups that can increase the k(eaq-) values. Additionally, almost all electron-withdrawing groups and a small part of electron-donating groups for the MI-CNN model can be highlighted for estimating k(eaq-). Overall, our results suggest that the CNN approach has smaller errors when compared to ML algorithms, making it a promising candidate for predicting other rate constants.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Prediction of wildfire rate of spread in grasslands using machine learning methods
    Khanmohammadi, Sadegh
    Arashpour, Mehrdad
    Golafshani, Emadaldin Mohammadi
    Cruz, Miguel G.
    Rajabifard, Abbas
    Bai, Yu
    ENVIRONMENTAL MODELLING & SOFTWARE, 2022, 156
  • [32] Machine and deep learning methods for radiomics
    Avanzo, Michele
    Wei, Lise
    Stancanello, Joseph
    Vallieres, Martin
    Rao, Arvind
    Morin, Olivier
    Mattonen, Sarah A.
    El Naqa, Issam
    MEDICAL PHYSICS, 2020, 47 (05) : E185 - E202
  • [33] Prediction of production rate of surface miner in coal mine: an application of single and ensemble machine learning methods
    Lawal, Abiodun Ismail
    Ogundipe, Olayemi Yinka
    Kim, Minju
    Kwon, Sangki
    EARTH SCIENCE INFORMATICS, 2024, 17 (04) : 3351 - 3364
  • [34] Severity Prediction with Machine Learning Methods
    Geyik, Buket
    Kara, Medine
    2ND INTERNATIONAL CONGRESS ON HUMAN-COMPUTER INTERACTION, OPTIMIZATION AND ROBOTIC APPLICATIONS (HORA 2020), 2020, : 382 - 388
  • [35] The application of machine learning methods for prediction of metal sorption onto biochars
    Zhu, Xinzhe
    Wang, Xiaonan
    Ok, Yong Sik
    JOURNAL OF HAZARDOUS MATERIALS, 2019, 378
  • [36] Application of machine learning methods in photovoltaic output power prediction: A review
    Zhang, Wenyong
    Li, Qingwei
    He, Qifeng
    JOURNAL OF RENEWABLE AND SUSTAINABLE ENERGY, 2022, 14 (02)
  • [37] Review of machine learning and deep learning models for toxicity prediction
    Guo, Wenjing
    Liu, Jie
    Dong, Fan
    Song, Meng
    Li, Zoe
    Khan, Md Kamrul Hasan
    Patterson, Tucker A.
    Hong, Huixiao
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (21) : 1952 - 1973
  • [38] Dropout prediction in Moocs using deep learning and machine learning
    Basnet, Ram B.
    Johnson, Clayton
    Doleck, Tenzin
    EDUCATION AND INFORMATION TECHNOLOGIES, 2022, 27 (08) : 11499 - 11513
  • [39] Cardiovascular diseases prediction by machine learning incorporation with deep learning
    Subramani, Sivakannan
    Varshney, Neeraj
    Anand, M. Vijay
    Soudagar, Manzoore Elahi M.
    Al-keridis, Lamya Ahmed
    Upadhyay, Tarun Kumar
    Alshammari, Nawaf
    Saeed, Mohd
    Subramanian, Kumaran
    Anbarasu, Krishnan
    Rohini, Karunakaran
    FRONTIERS IN MEDICINE, 2023, 10
  • [40] Prediction of Aureococcus anophageffens using machine learning and deep learning
    Niu, Jie
    Lu, Yanqun
    Xie, Mengyu
    Ou, Linjian
    Cui, Lei
    Qiu, Han
    Lu, Songhui
    MARINE POLLUTION BULLETIN, 2024, 200