Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset

被引:0
|
作者
Joung, Joonyoung F. [1 ]
Fong, Mun Hong [1 ]
Roh, Jihye [1 ]
Tu, Zhengkai [2 ]
Bradshaw, John [1 ]
Coley, Connor W. [1 ,2 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[2] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
Machine learning; Reaction outcome prediction; Reaction mechanisms; Organic chemistry; HYPERSPHERE SEARCH METHOD; ELASTIC BAND METHOD; AUTOMATED DISCOVERY; CHEMICAL-REACTIONS; REACTION PATHWAYS; PREDICTION; EXPLORATION; GENERATION; CHEMISTRY; NETWORKS;
D O I
10.1002/anie.202411296
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation. Machine learning models trained on mechanistic datasets created using expert reaction templates demonstrate the ability to successfully predict known reaction mechanisms. This study illustrates how such mechanistic models can explain how reaction outcomes are produced, recapitulate the roles of catalysts and reagents, and suggest potential side products and impurities. image
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [42] CUGUV: A Benchmark Dataset for Promoting Large-Scale Urban Village Mapping with Deep Learning Models
    Wang, Ziyi
    Sun, Qiao
    Zhang, Xiao
    Hu, Zekun
    Chen, Jiaoqi
    Zhong, Cheng
    Li, Hui
    SCIENTIFIC DATA, 2025, 12 (01)
  • [43] Deep Learning Hyperspectral Pansharpening on Large-Scale PRISMA Dataset
    Zini, Simone
    Barbato, Mirko Paolo
    Piccoli, Flavio
    Napoletano, Paolo
    REMOTE SENSING, 2024, 16 (12)
  • [44] A Large-scale Attribute Dataset for Zero-shot Learning
    Zhao, Bo
    Fu, Yanwei
    Liang, Rui
    Wu, Jiahong
    Wang, Yonggang
    Wang, Yizhou
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 398 - 407
  • [45] A thinning routine for large-scale biogeochemical mechanistic ecosystem models
    Thurnher, Christopher
    Eastaugh, Chris S.
    Hasenauer, Hubert
    FOREST ECOLOGY AND MANAGEMENT, 2014, 320 : 56 - 69
  • [46] Optimally-reduced kinetic models: reaction elimination in large-scale kinetic mechanisms
    Bhattacharjee, B
    Schwer, DA
    Barton, PI
    Green, WH
    COMBUSTION AND FLAME, 2003, 135 (03) : 191 - 208
  • [47] Reproducible learning in large-scale graphical models
    Zhou, Jia
    Li, Yang
    Zheng, Zemin
    Li, Daoji
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 189
  • [48] SKETCHING FOR LARGE-SCALE LEARNING OF MIXTURE MODELS
    Keriven, Nicolas
    Bourrier, Anthony
    Gribonval, Remi
    Perez, Patrick
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6190 - 6194
  • [49] Sketching for large-scale learning of mixture models
    Keriven, Nicolas
    Bourrier, Anthony
    Gribonval, Remi
    Perez, Patrick
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2018, 7 (03) : 447 - 508
  • [50] NarrativeXL: a Large-scale Dataset for Long-Term Memory Models
    Moskvichev, Arseny
    Mai, Ky-Vinh
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15058 - 15072