Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules

被引:1
|
作者
Xiang, Yan [1 ]
Tang, Yu-Hang [2 ,3 ]
Gong, Zheng [1 ]
Liu, Hongyi [1 ]
Wu, Liang [1 ]
Lin, Guang [4 ,5 ]
Sun, Huai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Chem & Chem Engn, Shanghai 200240, Peoples R China
[2] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[3] Nvidia Corp, Santa Clara, CA 95051 USA
[4] Purdue Univ, Dept Math, W Lafayette, IN 47907 USA
[5] Purdue Univ, Sch Mech Engn, W Lafayette, IN 47907 USA
基金
中国国家自然科学基金;
关键词
ACCELERATED DISCOVERY; DENSITY;
D O I
10.1021/acs.jcim.3c01430
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with R-2 > 0.99 against the computational data and R-2 > 0.94 against the experimental data. The advantage of the presented AL algorithm is that the predicted uncertainty of GPR depends on only the molecular structures, which renders it compatible with high-throughput data generation.
引用
收藏
页码:6515 / 6524
页数:10
相关论文
共 50 条
  • [41] Machine learning methods for chemical properties and toxicity-based endpoints prediction using open source libraries
    Tkachenko, Valery
    Korotcov, Alexander
    Zakharov, Rick
    Sattarov, Boris
    Mitrofanov, Artem
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255
  • [42] Small Molecules' Multi-Metabolic Pathways Prediction Using Physico-Chemical Features and Multi-Task Learning Method
    Niu, Bing
    Gu, Lei
    Peng, Chunrong
    Ding, Juan
    Yuan, Xiaochen
    Lu, Wencong
    CURRENT BIOINFORMATICS, 2013, 8 (05) : 564 - 568
  • [43] Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design
    Muhammad Arslan Masood
    Samuel Kaski
    Tianyu Cui
    Journal of Cheminformatics, 17 (1)
  • [44] Deep Learning-Based Prediction of Material Properties Using Chemical Compositions and Diffraction Patterns as Experimentally Accessible Inputs
    Kim, Jeongrae
    Tiong, Leslie Ching Ow
    Kim, Donghun
    Han, Sang Soo
    JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2021, 12 (34): : 8376 - 8383
  • [45] Efficient and enhanced sampling of drug-like chemical space for virtual screening and molecular design using modern machine learning methods
    Goel, Manan
    Aggarwal, Rishal
    Sridharan, Bhuvanesh
    Pal, Pradeep Kumar
    Priyakumar, U. Deva
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2023, 13 (02)
  • [46] Leveraging available data for efficient exploration of materials space using Machine Learning: A case study for identifying rare earth-free permanent magnets
    Mal, Sourav
    Sen, Prasenjit
    JOURNAL OF MAGNETISM AND MAGNETIC MATERIALS, 2024, 589
  • [47] Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration
    Krishnan, Keerthi
    Kassab, Ryan
    Agajanian, Steve
    Verkhivker, Gennady
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (19)
  • [48] Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using Complementary DFT and Machine Learning Approaches
    Tawfik, Sherif Abdulkader
    Isayev, Olexandr
    Stampfl, Catherine
    Shapter, Joe
    Winkler, David A.
    Ford, Michael J.
    ADVANCED THEORY AND SIMULATIONS, 2019, 2 (01)
  • [49] Efficient prediction of elastic properties of Ti0.5Al0.5N at elevated temperature using machine learning interatomic potential
    Tasnadi, Ferenc
    Bock, Florian
    Tidholm, Johan
    Shapeev, Alexander, V
    Abrikosov, Igor A.
    THIN SOLID FILMS, 2021, 737
  • [50] Spatial prediction of physical and chemical properties of soil using optical satellite imagery: a state-of-the-art hybridization of deep learning algorithm
    Hosseini, Fatemeh Sadat
    Razavi-Termeh, Seyed Vahid
    Sadeghi-Niaraki, Abolghasem
    Choi, Soo-Mi
    Jamshidi, Mohammad
    FRONTIERS IN ENVIRONMENTAL SCIENCE, 2023, 11