Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset

被引:0
|
作者
Joung, Joonyoung F. [1 ]
Fong, Mun Hong [1 ]
Roh, Jihye [1 ]
Tu, Zhengkai [2 ]
Bradshaw, John [1 ]
Coley, Connor W. [1 ,2 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[2] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
Machine learning; Reaction outcome prediction; Reaction mechanisms; Organic chemistry; HYPERSPHERE SEARCH METHOD; ELASTIC BAND METHOD; AUTOMATED DISCOVERY; CHEMICAL-REACTIONS; REACTION PATHWAYS; PREDICTION; EXPLORATION; GENERATION; CHEMISTRY; NETWORKS;
D O I
10.1002/anie.202411296
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation. Machine learning models trained on mechanistic datasets created using expert reaction templates demonstrate the ability to successfully predict known reaction mechanisms. This study illustrates how such mechanistic models can explain how reaction outcomes are produced, recapitulate the roles of catalysts and reagents, and suggest potential side products and impurities. image
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A large-scale reaction dataset of mechanistic pathways of organic reactions
    Chen, Shuan
    Babazade, Ramil
    Kim, Taewan
    Han, Sunkyu
    Jung, Yousung
    SCIENTIFIC DATA, 2024, 11 (01)
  • [2] Towards provably efficient quantum algorithms for large-scale machine-learning models
    Junyu Liu
    Minzhao Liu
    Jin-Peng Liu
    Ziyu Ye
    Yunfei Wang
    Yuri Alexeev
    Jens Eisert
    Liang Jiang
    Nature Communications, 15
  • [3] Towards provably efficient quantum algorithms for large-scale machine-learning models
    Liu, Junyu
    Liu, Minzhao
    Liu, Jin-Peng
    Ye, Ziyu
    Wang, Yunfei
    Alexeev, Yuri
    Eisert, Jens
    Jiang, Liang
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [4] Open Challenges in Developing Generalizable Large-Scale Machine-Learning Models for Catalyst Discovery
    Kolluru, Adeesh
    Shuaibi, Muhammed
    Palizhati, Aini
    Shoghi, Nima
    Das, Abhishek
    Wood, Brandon
    Zitnick, C. Lawrence
    Kitchin, John R.
    Ulissi, Zachary W.
    ACS CATALYSIS, 2022, 12 (14): : 8572 - 8581
  • [5] A Machine-Learning Approach for Communication Prediction of Large-Scale Applications
    Papadopoulou, Nikela
    Goumas, Georgios
    Koziris, Nectarios
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 120 - 123
  • [6] ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning
    Kaltenborn, Julia
    Lange, Charlotte Emilie Elektra
    Ramesh, Venkatesh
    Brouillard, Philippe
    Gurwicz, Yaniv
    Nagda, Chandni
    Runge, Jakob
    Nowack, Peer
    Rolnick, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Robustness evaluation of large-scale machine learning-based reduced order models for reproducing flow fields
    Higashida, Aito
    Ando, Kazuto
    Ruettgers, Mario
    Lintermann, Andreas
    Tsubokura, Makoto
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 243 - 254
  • [8] Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques
    Lu, Dan
    Ricciuto, Daniel
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2019, 12 (05) : 1791 - 1807
  • [9] Hamiltonian learning using machine-learning models trained with continuous measurements
    Tucker, Kris
    Rege, Amit Kiran
    Smith, Conor
    Monteleoni, Claire
    Albash, Tameem
    PHYSICAL REVIEW APPLIED, 2024, 22 (04):
  • [10] Automatic Detection of Large-scale Flux Ropes and Their Geoeffectiveness with a Machine-learning Approach
    Pal, Sanchita
    dos Santos, Luiz F. G.
    Weiss, Andreas J.
    Narock, Thomas
    Narock, Ayris
    Nieves-Chinchilla, Teresa
    Jian, Lan K.
    Good, Simon W.
    ASTROPHYSICAL JOURNAL, 2024, 972 (01):