XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications

被引:10
|
作者
Kartashov, Oleg O. [1 ]
Chernov, Andrey V. [1 ]
Polyanichenko, Dmitry S. [1 ]
Butakova, Maria A. [1 ]
机构
[1] Southern Fed Univ, Smart Mat Res Inst, 178-24 Sladkova, Rostov Na Donu 344090, Russia
关键词
functional materials; materials characterization; data preprocessing; X-ray absorption spectra; machine learning; SPECTROSCOPY; EXAFS; FLUORESCENCE; CATALYST; ARTEMIS; SPECTRA; ATHENA;
D O I
10.3390/ma14247884
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Innovative development in the energy and chemical industries is mainly dependent on advances in the accelerated design and development of new functional materials. The success of research in new nanocatalysts mainly relies on modern techniques and approaches for their precise characterization. The existing methods of experimental characterization of nanocatalysts, which make it possible to assess the possibility of using these materials in specific chemical reactions or applications, generate significant amounts of heterogeneous data. The acceleration of new functional materials, including nanocatalysts, directly depends on the speed and quality of extracting hidden dependencies and knowledge from the obtained experimental data. Usually, such experiments involve different characterization techniques and different types of X-ray absorption spectroscopy (XAS) too. Using the machine learning (ML) methods based on XAS data, we can study and predict the atomic-scale structure and another bunch of parameters for the nanocatalyst efficiently. However, before using any ML model, it is necessary to make sure that the XAS raw experimental data is properly pre-processed, cleared, and prepared for ML application. Usually, the XAS preprocessing stage is vaguely presented in scientific studies, and the main efforts of researchers are devoted to the ML description and implementation stage. However, the quality of the input data influences the quality of ML analysis and the prediction results used in the future. This paper fills the gap between the stage of obtaining XAS data from synchrotron facilities and the stage of using and customizing various ML analysis and prediction models. We aimed this study to develop automated tools for the preprocessing and presentation of data from physical experiments and the creation of deposited datasets on the basis of the example of studying palladium-based nanocatalysts using synchrotron radiation facilities. During the study, methods of preliminary processing of XAS data were considered, which can be conditionally divided into X-ray absorption near edge structure (XANES) and extended X-ray absorption fine structure (EXAFS). This paper proposes a software toolkit that implements data preprocessing scenarios in the form of a single pipeline. The main preprocessing methods used in this study proposed are principal component analysis (PCA); z-score normalization; the interquartile method for eliminating outliers in the data; as well as the k-means machine learning method, which makes it possible to clarify the phase of the studied material sample by clustering feature vectors of experiments. Among the results of this study, one should also highlight the obtained deposited datasets of physical experiments on palladium-based nanocatalysts using synchrotron radiation. This will allow for further high-quality data mining to extract new knowledge about materials using artificial intelligence methods and machine learning models, and will ensure the smooth dissemination of these datasets to researchers and their reuse.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Machine Learning Preprocessing Method for Suicide Prediction
    Iliou, Theodoros
    Konstantopoulou, Georgia
    Ntekouli, Mandani
    Lymberopoulos, Dimitrios
    Assimakopoulos, Konstantinos
    Galiatsatos, Dimitrios
    Anastassopoulos, George
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2016, 2016, 475 : 53 - 60
  • [42] Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline
    Biswas, Sumon
    Rajan, Hridesh
    PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 981 - 993
  • [43] Three Optimization Methods for Preprocessing Dam Safety Monitoring Data Using Machine Learning
    Jiang, Zihan
    Gu, Hao
    Fang, Yue
    Shao, Chenfei
    Lu, Xi
    Cao, Wenhan
    Wang, Jiayi
    Wu, Yan
    Zhu, Mingyuan
    STRUCTURAL CONTROL & HEALTH MONITORING, 2024, 2024 (01):
  • [44] Big Data-Supply Chain Management Framework for Forecasting: Data Preprocessing and Machine Learning Techniques
    Jahin, Md Abrar
    Shovon, Md Sakib Hossain
    Shin, Jungpil
    Ridoy, Istiyaque Ahmed
    Mridha, M. F.
    ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2024, 31 (06) : 3619 - 3645
  • [45] A Study on the Prediction of Characteristics of Molding Sand Using Machine Learning and Data Preprocessing Techniques
    Lee, Jeong-Min
    Kim, Moon-Jo
    Choe, Kyeong-Hwan
    Kim, DongEung
    KOREAN JOURNAL OF METALS AND MATERIALS, 2023, 61 (01): : 18 - 27
  • [46] Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing
    Chakraborty, Pritam
    Bandyopadhyay, Anjan
    Sahu, Preeti Padma
    Burman, Aniket
    Mallik, Saurav
    Alsubaie, Najah
    Abbas, Mohamed
    Alqahtani, Mohammed S.
    Soufiene, Ben Othman
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [47] Chronic Diseases Prediction Using Machine Learning With Data Preprocessing Handling: A Critical Review
    Ghaniaviyanto Ramadhan, Nur
    Adiwijaya
    Maharani, Warih
    Akbar Gozali, Alfian
    IEEE ACCESS, 2024, 12 : 80698 - 80730
  • [48] Data preprocessing and feature selection techniques in gait recognition: A comparative study of machine learning and deep learning approaches
    Parashar, Anubha
    Parashar, Apoorva
    Ding, Weiping
    Shabaz, Mohammad
    Rida, Imad
    PATTERN RECOGNITION LETTERS, 2023, 172 : 65 - 73
  • [49] Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care
    Ortiz, Bengie L.
    Gupta, Vibhuti
    Kumar, Rajnish
    Jalin, Aditya
    Cao, Xiao
    Ziegenbein, Charles
    Singhal, Ashutosh
    Tewari, Muneesh
    Choi, Sung Won
    JMIR MHEALTH AND UHEALTH, 2024, 12
  • [50] Applications of Entropy in Data Analysis and Machine Learning: A Review
    Sepulveda-Fontaine, Salome A.
    Amigo, Jose M.
    ENTROPY, 2024, 26 (12)