EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions

被引:7
|
作者
Heid, Esther [1 ,2 ]
Probst, Daniel [3 ]
Green, William H. [2 ]
Madsen, Georg K. H. [1 ]
机构
[1] TU Wien, Inst Mat Chem, A-1060 Vienna, Austria
[2] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[3] IBM Res Europe, CH-8803 Ruschlikon, Switzerland
基金
奥地利科学基金会;
关键词
BIOCATALYSIS; CASCADE; RETROSYNTHESIS; RESOURCE; OUTCOMES; DESIGN; TOOL;
D O I
10.1039/d3sc02048g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online. New curation and atom-mapping routine leading to large database of enzymatic reactions boosts performance of deep learning models.
引用
收藏
页码:14229 / 14242
页数:14
相关论文
共 50 条
  • [1] A Data-Driven Analysis of Behaviors in Data Curation Processes
    Han, Lei
    Chen, Tianwa
    Demartini, Gianluca
    Indulska, Marta
    Sadiq, Shazia
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (03)
  • [2] Avant-garde: an automated data-driven DIA data curation tool
    Alvaro Sebastian Vaca Jacome
    Ryan Peckner
    Nicholas Shulman
    Karsten Krug
    Katherine C. DeRuff
    Adam Officer
    Karen E. Christianson
    Brendan MacLean
    Michael J. MacCoss
    Steven A. Carr
    Jacob D. Jaffe
    Nature Methods, 2020, 17 : 1237 - 1244
  • [3] Avant-garde: an automated data-driven DIA data curation tool
    Vaca Jacome, Alvaro Sebastian
    Peckner, Ryan
    Shulman, Nicholas
    Krug, Karsten
    DeRuff, Katherine C.
    Officer, Adam
    Christianson, Karen E.
    MacLean, Brendan
    MacCoss, Michael J.
    Carr, Steven A.
    Jaffe, Jacob D.
    NATURE METHODS, 2020, 17 (12) : 1237 - +
  • [4] The Diderot effect: a data-driven validation
    Santos, Andre
    Antonio, Nuno
    Rita, Paulo
    JOURNAL OF MARKETING ANALYTICS, 2025,
  • [5] Collection development or data-driven content curation? An exploratory project in Manchester
    Kirkwood, Rachel Joy
    LIBRARY MANAGEMENT, 2016, 37 (4-5) : 275 - 284
  • [6] Data-driven prediction of Air Traffic Controllers reactions to resolving conflicts.
    Bastas, Alevizos
    Vouros, George
    INFORMATION SCIENCES, 2022, 613 : 763 - 785
  • [7] Data-driven optimal prediction with control
    Katrutsa, Aleksandr
    Oseledets, Ivan
    Utyuzhnikov, Sergey
    COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2025, 143
  • [8] Prediction rigidities for data-driven chemistry
    Chong, Sanggyu
    Bigi, Filippo
    Grasselli, Federico
    Loche, Philip
    Kellner, Matthias
    Ceriotti, Michele
    FARADAY DISCUSSIONS, 2025, 256 (00) : 322 - 344
  • [9] Data-Driven Model for Rockburst Prediction
    Zhao, Hongbo
    Chen, Bingrui
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [10] Data-driven nonparametric prediction intervals
    Frey, Jesse
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2013, 143 (06) : 1039 - 1048