Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset

被引:0
|
作者
Joung, Joonyoung F. [1 ]
Fong, Mun Hong [1 ]
Roh, Jihye [1 ]
Tu, Zhengkai [2 ]
Bradshaw, John [1 ]
Coley, Connor W. [1 ,2 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[2] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
Machine learning; Reaction outcome prediction; Reaction mechanisms; Organic chemistry; HYPERSPHERE SEARCH METHOD; ELASTIC BAND METHOD; AUTOMATED DISCOVERY; CHEMICAL-REACTIONS; REACTION PATHWAYS; PREDICTION; EXPLORATION; GENERATION; CHEMISTRY; NETWORKS;
D O I
10.1002/anie.202411296
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation. Machine learning models trained on mechanistic datasets created using expert reaction templates demonstrate the ability to successfully predict known reaction mechanisms. This study illustrates how such mechanistic models can explain how reaction outcomes are produced, recapitulate the roles of catalysts and reagents, and suggest potential side products and impurities. image
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Efficient Distributed Machine Learning for Large-scale Models by Reducing Redundant Communication
    Yokoyama, Harumichi
    Araki, Takuya
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [32] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [33] Large-scale kernel extreme learning machine
    Deng, Wan-Yu
    Zheng, Qing-Hua
    Chen, Lin
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (11): : 2235 - 2246
  • [34] Machine learning for large-scale MOF screening
    Coupry, Damien
    Groot, Laurens
    Addicoat, Matthew
    Heine, Thomas
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [35] Robust Large-Scale Machine Learning in the Cloud
    Rendle, Steffen
    Fetterly, Dennis
    Shekita, Eugene J.
    Su, Bor-yiing
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1125 - 1134
  • [36] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [37] Large-Scale Machine Learning and Neuroimaging in Psychiatry
    Thompson, Paul
    BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51
  • [38] Coding for Large-Scale Distributed Machine Learning
    Xiao, Ming
    Skoglund, Mikael
    ENTROPY, 2022, 24 (09)
  • [39] Resource Elasticity for Large-Scale Machine Learning
    Huang, Botong
    Boehm, Matthias
    Tian, Yuanyuan
    Reinwald, Berthold
    Tatikonda, Shirish
    Reiss, Frederick R.
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 137 - 152
  • [40] TensorFlow: A system for large-scale machine learning
    Abadi, Martin
    Barham, Paul
    Chen, Jianmin
    Chen, Zhifeng
    Davis, Andy
    Dean, Jeffrey
    Devin, Matthieu
    Ghemawat, Sanjay
    Irving, Geoffrey
    Isard, Michael
    Kudlur, Manjunath
    Levenberg, Josh
    Monga, Rajat
    Moore, Sherry
    Murray, Derek G.
    Steiner, Benoit
    Tucker, Paul
    Vasudevan, Vijay
    Warden, Pete
    Wicke, Martin
    Yu, Yuan
    Zheng, Xiaoqiang
    PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2016, : 265 - 283