An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors

被引:3
|
作者
Gallegos, Miguel [1 ]
Isamura, Bienfait Kabuyaya [2 ]
Popelier, Paul L. A. [2 ]
Pendas, Angel Martin [1 ]
机构
[1] Univ Oviedo, Dept Analyt & Phys Chem, E-33006 Oviedo, Spain
[2] Univ Manchester, Dept Chem, Manchester M13 9PL, England
基金
英国科研创新办公室; 欧洲研究理事会;
关键词
MOLECULAR DESCRIPTORS; POTENTIALS; INSIGHTS; FEATURES; MODELS; QSAR;
D O I
10.1021/acs.jcim.3c01906
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.
引用
收藏
页码:3059 / 3079
页数:21
相关论文
共 50 条
  • [31] Machine Learning approach to Automatic Bucket Loading
    Dadhich, Siddharth
    Bodin, Ulf
    Sandin, Fredrik
    Andersson, Ulf
    2016 24TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2016, : 1260 - 1265
  • [32] PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors
    Deng, Haowen
    Birdal, Tolga
    Ilic, Slobodan
    COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 620 - 638
  • [33] An Unsupervised Machine Learning Approach for Process Monitoring by Visual Analytics
    Garces, Hugo O.
    Aballay, Bastian
    Rao, Harikrishna Rao Mohan
    Chen, Tongwen
    Shah, Sirish L.
    IFAC PAPERSONLINE, 2024, 58 (14): : 847 - 854
  • [34] Highway Project Clustering Using Unsupervised Machine Learning Approach
    Alikhani, Hamed
    Jeong, H. David
    COMPUTING IN CIVIL ENGINEERING 2021, 2022, : 172 - 179
  • [35] An Approach for Clustering of Seismic Events using Unsupervised Machine Learning
    Karmenova, Markhaba
    Tlebaldinova, Aizhan
    Krak, Iurii
    Denissova, Natalya
    Popova, Galina
    Zhantassova, Zheniskul
    Ponkina, Elena
    Gyorok, Gyorgy
    ACTA POLYTECHNICA HUNGARICA, 2022, 19 (05) : 7 - 22
  • [36] Learning local image descriptors
    Winder, Simon A. J.
    Brown, Matthew
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 17 - +
  • [37] Automatic Active Lesion Tracking in Multiple Sclerosis Using Unsupervised Machine Learning
    Uwaeze, Jason
    Narayana, Ponnada A.
    Kamali, Arash
    Braverman, Vladimir
    Jacobs, Michael A.
    Akhbardeh, Alireza
    DIAGNOSTICS, 2024, 14 (06)
  • [38] Unsupervised Machine Learning for Automatic Image Segmentation of Impact Damage in CFRP Composites
    Zhupanska, Olesya
    Krokhmal, Pavlo
    APPLIED COMPOSITE MATERIALS, 2024, : 1849 - 1867
  • [39] Automatic Algorithm for Quality Assessment of the Unsupervised Spirometry Based on Machine Learning Method
    Solinski, Mateusz
    Walag, Damian
    Gorska, Katarzyna
    Korczynski, Piotr
    Kuznar-Kaminska, Barbara
    Grabicki, Marcin
    Koltowski, Lukasz
    JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, 2022, 149 (02) : AB42 - AB42
  • [40] Automatic Protocol Feature Word Construction Based on Machine Learning
    Li, Haifeng
    Zhang, Bin
    Shuai, Bo
    Wang, Jian
    Tang, Chaojing
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 93 - 97