Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules

被引:1
|
作者
Xiang, Yan [1 ]
Tang, Yu-Hang [2 ,3 ]
Gong, Zheng [1 ]
Liu, Hongyi [1 ]
Wu, Liang [1 ]
Lin, Guang [4 ,5 ]
Sun, Huai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Chem & Chem Engn, Shanghai 200240, Peoples R China
[2] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[3] Nvidia Corp, Santa Clara, CA 95051 USA
[4] Purdue Univ, Dept Math, W Lafayette, IN 47907 USA
[5] Purdue Univ, Sch Mech Engn, W Lafayette, IN 47907 USA
基金
中国国家自然科学基金;
关键词
ACCELERATED DISCOVERY; DENSITY;
D O I
10.1021/acs.jcim.3c01430
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with R-2 > 0.99 against the computational data and R-2 > 0.94 against the experimental data. The advantage of the presented AL algorithm is that the predicted uncertainty of GPR depends on only the molecular structures, which renders it compatible with high-throughput data generation.
引用
收藏
页码:6515 / 6524
页数:10
相关论文
共 50 条
  • [31] Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation
    Weinreich, Jan
    Browning, Nicholas J.
    von Lilienfeld, O. Anatole
    JOURNAL OF CHEMICAL PHYSICS, 2021, 154 (13):
  • [32] Machine learning-based approach for efficient prediction of toxicity of chemical gases using feature selection
    Erturan, Ahmet Murat
    Karaduman, Gul
    Durmaz, Habibe
    JOURNAL OF HAZARDOUS MATERIALS, 2023, 455
  • [33] Designing efficient materials for high-performance organic solar cells: Detailed chemical space exploration, machine learning and virtual screening
    Tufail, Muhammad Khurram
    Shah, Syed Shoaib Ahmad
    Khan, Salahuddin
    Ahmad, Farooq
    Kiruri, Lucy W.
    Abbasi, Misbah Sehar
    Ahmad, Ali
    CHEMICAL PHYSICS LETTERS, 2024, 834
  • [34] Novel multi-step-based process modelling and prediction of thermodynamic properties of steam using machine learning
    Kharola, Ashwani
    Sharma, Kiran
    Choudhary, Vishwjeet
    International Journal of Manufacturing Research, 2024, 19 (03) : 239 - 265
  • [35] Exploration and Design of Carbon Dot-Based Long Afterglow Materials Using Active Machine Learning and Quantum Chemical Simulations
    Yang, Hongwei
    Ran, Zhun
    Luo, Yimeng
    Liu, Siyuan
    Xu, Weizhe
    Liu, Jinkun
    Cui, Jianghu
    Lei, Bingfu
    Hu, Chaofan
    Zhuang, Jianle
    Liu, Yingliang
    Xiao, Yong
    ACS NANO, 2024, 18 (42) : 29203 - 29213
  • [36] Prediction of antimicrobial peptides toxicity based on their physico-chemical properties using machine learning techniques
    Khabbaz, Hossein
    Karimi-Jafari, Mohammad Hossein
    Saboury, Ali Akbar
    BabaAli, Bagher
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [37] Prediction of antimicrobial peptides toxicity based on their physico-chemical properties using machine learning techniques
    Hossein Khabbaz
    Mohammad Hossein Karimi-Jafari
    Ali Akbar Saboury
    Bagher BabaAli
    BMC Bioinformatics, 22
  • [38] Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
    Shirafkan, Farshid
    Gharaghani, Sajjad
    Rahimian, Karim
    Sajedi, Reza Hasan
    Zahiri, Javad
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [39] Correction to: Moonlighting protein prediction using physico‑chemical and evolutional properties via machine learning methods
    Farshid Shirafkan
    Sajjad Gharaghani
    Karim Rahimian
    Reza Hasan Sajedi
    Javad Zahiri
    BMC Bioinformatics, 22
  • [40] Efficient exploration of transition-metal decorated MXene for carbon monoxide sensing using integrated active learning and density functional theory
    Boonpalit, Kajjana
    Kinchagawat, Jiramet
    Prommin, Chanatkran
    Nutanong, Sarana
    Namuangruk, Supawadee
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2023, 25 (42) : 28657 - 28668