An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors

被引:3
|
作者
Gallegos, Miguel [1 ]
Isamura, Bienfait Kabuyaya [2 ]
Popelier, Paul L. A. [2 ]
Pendas, Angel Martin [1 ]
机构
[1] Univ Oviedo, Dept Analyt & Phys Chem, E-33006 Oviedo, Spain
[2] Univ Manchester, Dept Chem, Manchester M13 9PL, England
基金
英国科研创新办公室; 欧洲研究理事会;
关键词
MOLECULAR DESCRIPTORS; POTENTIALS; INSIGHTS; FEATURES; MODELS; QSAR;
D O I
10.1021/acs.jcim.3c01906
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.
引用
收藏
页码:3059 / 3079
页数:21
相关论文
共 50 条
  • [1] Progressive Unsupervised Learning of Local Descriptors
    Wang, Wufan
    Zhang, Lei
    Huang, Hua
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2371 - 2379
  • [2] Unsupervised Learning of Local Equivariant Descriptors for Point Clouds
    Marcon, Marlon
    Spezialetti, Riccardo
    Salti, Samuele
    Silva, Luciano
    Di Stefano, Luigi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9687 - 9702
  • [3] Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach
    Aguilera-Mendoza, Longendri
    Marrero-Ponce, Yovani
    Garcia-Jacas, Cesar R.
    Chavez, Edgar
    Beltran, Jesus A.
    Guillen-Ramirez, Hugo A.
    Brizuela, Carlos A.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [4] Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach
    Longendri Aguilera-Mendoza
    Yovani Marrero-Ponce
    César R. García-Jacas
    Edgar Chavez
    Jesus A. Beltran
    Hugo A. Guillen-Ramirez
    Carlos A. Brizuela
    Scientific Reports, 10
  • [5] Uncovering electronic and geometric descriptors of chemical activity for metal alloys and oxides using unsupervised machine learning
    Esterhuizen, Jacques A.
    Goldsmith, Bryan R.
    Linic, Suljo
    CHEM CATALYSIS, 2021, 1 (04): : 923 - 940
  • [6] An Unsupervised Machine Learning Approach for Monitoring Data Fusion and Health Indicator Construction
    Huang, Lin
    Pan, Xin
    Liu, Yajie
    Gong, Li
    SENSORS, 2023, 23 (16)
  • [7] Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition
    Jaeger, Sabrina
    Fulle, Simone
    Turk, Samo
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (01) : 27 - 35
  • [8] PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors
    Revaud, Jerome
    Leroy, Vincent
    Weinzaepfel, Philippe
    Chidlovskii, Boris
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3916 - 3926
  • [9] Towards the construction of regional marine radiocarbon calibration curves: an unsupervised machine learning approach
    Marza, Ana-Cristina
    Menviel, Laurie
    Skinner, Luke C.
    GEOCHRONOLOGY, 2024, 6 (04): : 503 - 519
  • [10] Autism screening: an unsupervised machine learning approach
    Fadi Thabtah
    Robinson Spencer
    Neda Abdelhamid
    Firuz Kamalov
    Carl Wentzel
    Yongsheng Ye
    Thanu Dayara
    Health Information Science and Systems, 10