Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules

被引:1
|
作者
Xiang, Yan [1 ]
Tang, Yu-Hang [2 ,3 ]
Gong, Zheng [1 ]
Liu, Hongyi [1 ]
Wu, Liang [1 ]
Lin, Guang [4 ,5 ]
Sun, Huai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Chem & Chem Engn, Shanghai 200240, Peoples R China
[2] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[3] Nvidia Corp, Santa Clara, CA 95051 USA
[4] Purdue Univ, Dept Math, W Lafayette, IN 47907 USA
[5] Purdue Univ, Sch Mech Engn, W Lafayette, IN 47907 USA
基金
中国国家自然科学基金;
关键词
ACCELERATED DISCOVERY; DENSITY;
D O I
10.1021/acs.jcim.3c01430
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with R-2 > 0.99 against the computational data and R-2 > 0.94 against the experimental data. The advantage of the presented AL algorithm is that the predicted uncertainty of GPR depends on only the molecular structures, which renders it compatible with high-throughput data generation.
引用
收藏
页码:6515 / 6524
页数:10
相关论文
共 50 条
  • [21] Prediction of Thermodynamic Properties of C60-Based Fullerenols Using Machine Learning
    Yang, Guiping
    Zhang, Shu
    Zhao, Pei
    Li, Chuanhao
    Tang, Lei
    Jiang, Jun
    Zhao, Chong
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2025, 21 (02) : 1001 - 1009
  • [22] "Freedom of design" in chemical compound space: towards rational in silico design of molecules with targeted quantum-mechanical properties
    Sandonas, Leonardo Medrano
    Hoja, Johannes
    Ernst, Brian G.
    Vazquez-Mayagoitia, Alvaro
    Distasio, Robert A.
    Tkatchenko, Alexandre
    CHEMICAL SCIENCE, 2023, 14 (39) : 10702 - 10717
  • [23] Turn-key constrained parameter space exploration for particle accelerators using Bayesian active learning
    Ryan Roussel
    Juan Pablo Gonzalez-Aguilera
    Young-Kee Kim
    Eric Wisniewski
    Wanming Liu
    Philippe Piot
    John Power
    Adi Hanuka
    Auralee Edelen
    Nature Communications, 12
  • [24] Effective design space exploration of gradient nanostructured materials using active learning based surrogate models
    Chen, Xin
    Zhou, Haofei
    Li, Yumeng
    MATERIALS & DESIGN, 2019, 183
  • [25] Turn-key constrained parameter space exploration for particle accelerators using Bayesian active learning
    Roussel, Ryan
    Gonzalez-Aguilera, Juan Pablo
    Kim, Young-Kee
    Wisniewski, Eric
    Liu, Wanming
    Piot, Philippe
    Power, John
    Hanuka, Adi
    Edelen, Auralee
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [26] Efficient prediction of optical properties in hexagonal PCF using machine learning models
    Khatun, M.R.
    Hossain, Muhammad Minoar
    Optik, 2024, 312
  • [27] Machine Learning-Based SERS Chemical Space for Two-Way Prediction of Structures and Spectra of Untrained Molecules
    Chen, Jaslyn Ru Ting
    Tan, Emily Xi
    Tang, Jingxiang
    Leong, Shi Xuan
    Hue, Sean Kai Xun
    Pun, Chi Seng
    Phang, In Yee
    Ling, Xing Yi
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2025, 147 (08) : 6654 - 6664
  • [28] Accurate Prediction of 1H NMR Chemical Shifts of Small Molecules Using Machine Learning
    Sajed, Tanvir
    Sayeeda, Zinat
    Lee, Brian L.
    Berjanskii, Mark
    Wang, Fei
    Gautam, Vasuk
    Wishart, David S.
    METABOLITES, 2024, 14 (05)
  • [29] MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES
    Kwon, Yongbeom
    Lee, Juyong
    JOURNAL OF CHEMINFORMATICS, 2021, 13 (01)
  • [30] MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES
    Yongbeom Kwon
    Juyong Lee
    Journal of Cheminformatics, 13