XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications

被引:10
|
作者
Kartashov, Oleg O. [1 ]
Chernov, Andrey V. [1 ]
Polyanichenko, Dmitry S. [1 ]
Butakova, Maria A. [1 ]
机构
[1] Southern Fed Univ, Smart Mat Res Inst, 178-24 Sladkova, Rostov Na Donu 344090, Russia
关键词
functional materials; materials characterization; data preprocessing; X-ray absorption spectra; machine learning; SPECTROSCOPY; EXAFS; FLUORESCENCE; CATALYST; ARTEMIS; SPECTRA; ATHENA;
D O I
10.3390/ma14247884
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Innovative development in the energy and chemical industries is mainly dependent on advances in the accelerated design and development of new functional materials. The success of research in new nanocatalysts mainly relies on modern techniques and approaches for their precise characterization. The existing methods of experimental characterization of nanocatalysts, which make it possible to assess the possibility of using these materials in specific chemical reactions or applications, generate significant amounts of heterogeneous data. The acceleration of new functional materials, including nanocatalysts, directly depends on the speed and quality of extracting hidden dependencies and knowledge from the obtained experimental data. Usually, such experiments involve different characterization techniques and different types of X-ray absorption spectroscopy (XAS) too. Using the machine learning (ML) methods based on XAS data, we can study and predict the atomic-scale structure and another bunch of parameters for the nanocatalyst efficiently. However, before using any ML model, it is necessary to make sure that the XAS raw experimental data is properly pre-processed, cleared, and prepared for ML application. Usually, the XAS preprocessing stage is vaguely presented in scientific studies, and the main efforts of researchers are devoted to the ML description and implementation stage. However, the quality of the input data influences the quality of ML analysis and the prediction results used in the future. This paper fills the gap between the stage of obtaining XAS data from synchrotron facilities and the stage of using and customizing various ML analysis and prediction models. We aimed this study to develop automated tools for the preprocessing and presentation of data from physical experiments and the creation of deposited datasets on the basis of the example of studying palladium-based nanocatalysts using synchrotron radiation facilities. During the study, methods of preliminary processing of XAS data were considered, which can be conditionally divided into X-ray absorption near edge structure (XANES) and extended X-ray absorption fine structure (EXAFS). This paper proposes a software toolkit that implements data preprocessing scenarios in the form of a single pipeline. The main preprocessing methods used in this study proposed are principal component analysis (PCA); z-score normalization; the interquartile method for eliminating outliers in the data; as well as the k-means machine learning method, which makes it possible to clarify the phase of the studied material sample by clustering feature vectors of experiments. Among the results of this study, one should also highlight the obtained deposited datasets of physical experiments on palladium-based nanocatalysts using synchrotron radiation. This will allow for further high-quality data mining to extract new knowledge about materials using artificial intelligence methods and machine learning models, and will ensure the smooth dissemination of these datasets to researchers and their reuse.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Exploring Data Preprocessing and Machine Learning Methods for Forecasting Worldwide Fertilizers Consumption
    Pacheco, Carla
    Guimaraes, Mario
    Bezerra, Eduardo
    Lobosco, Dacy
    Soares, Jorge
    Gonzales, Pedro Henrique
    Andrade, Adalberto
    de Souza, Cristina Gomes
    Ogasawara, Eduardo
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [32] Comparative Analysis of Data Preprocessing Methods in Machine Learning for Breast Cancer Classification
    Stockton, Timothy
    Peddle, Brandon
    Gaulin, Angelica
    Wiechert, Emma
    Lu, Wei
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 3, AINA 2024, 2024, 201 : 268 - 279
  • [33] FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation
    Kim, Taeyoon
    Park, ChanHo
    Mukimbekov, Mansur
    Hong, Heelim
    Kim, Minseok
    Jin, Ze
    Kim, Changdae
    Shin, Ji-Yong
    Jeon, Myeongjae
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (04): : 863 - 876
  • [34] Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective
    Olisah, Chollette C.
    Smith, Lyndon
    Smith, Melvyn
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 220
  • [35] Motion Data Preprocessing in Robotic Applications
    Benicky, Peter
    Jurisica, Ladislav
    Vitko, Anton
    CONTROL ENGINEERING AND APPLIED INFORMATICS, 2015, 17 (01): : 3 - 11
  • [36] A Data-Driven Methodology for Guiding the Selection of Preprocessing Techniques in a Machine Learning Pipeline
    Garcia-Carraseo, Jorge
    Mate, Alejandro
    Trujillo, Juan
    INTELLIGENT INFORMATION SYSTEMS, CAISE FORUM 2023, 2023, 477 : 34 - 42
  • [37] Qualifying data on railroad track vibrations: a hybrid data preprocessing flow of statistical and machine learning approaches
    Lin, Chih-Chiang
    Zhuang, Zheng-Yun
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2023, 46 (08) : 839 - 855
  • [38] An empirical analysis of data preprocessing for machine learning-based software cost estimation
    Huang, Jianglin
    Li, Yan-Fu
    Xie, Min
    INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 67 : 108 - 127
  • [39] Machine Learning-based Identification of Contaminated Images in Light Curve Data Preprocessing
    Hui Li
    Rong-Wang Li
    Peng Shu
    Yu-Qiang Li
    Research in Astronomy and Astrophysics, 2024, 24 (04) : 289 - 297
  • [40] Machine Learning-based Identification of Contaminated Images in Light Curve Data Preprocessing
    Li, Hui
    Li, Rong-Wang
    Shu, Peng
    Li, Yu-Qiang
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2024, 24 (04)